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ABSTRACT 


ONBOARD MULTICHANNEL 
DEMULTIPLEXER/DEMODULATOR STUDY 

NAS3-24885 

S. JOSEPH CAMPANELLA 
AND 

SOHIELSAYEGH 

COMSAT LABORATORIES 

CLARKSBURG, MARYLAND 

An investigation, performed for NASA LeRC by COMSAT LABS , of a digitally implemented 
on-board demultiplexer/demodulator able to process a mix of uplink carriers of differing 
bandwidths and center frequencies and programmable in orbit to accommodate variations in traffic 
flow is reported. The processor accepts high speed samples of the signal carried in a wideband 
satellite transponder channel, processes these as a composite to determine the signal spectrum, 
filters the result into individual channels that carry modulated carriers and demodulates these to 
recover their digital baseband content. The processor is implemented by using forward and inverse 
pipeline Fast Fourier Transformation techniques. The recovered carriers are then demodulated 
using a single digitally implemented demodulator that processes all of the modulated carriers. The 
effort has determined the feasibility of the concept with multiple TDMA carriers, identified critical 
path technologies, and assessed the potential of developing these technologies to a level capable of 
supporting a practical, cost effective on-board implementation. The approach is referred to as a 
flexible, high speed, digitally implemented Fast Fourier Transform (FFT) bulk 
demultiplexer/demodulator. 
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NASA CONTRACT NAS3-24885 

ON-BOARD MULTICHANNEL DEMULTIPLEXER/DEMODULATOR STUDY 

FINAL REPORT 


1.0 INTRODUCTION 

The purpose of this study is to conduct an investigation of an 
on-board demuitiplexer/demodulator concept, determine its feasibility 
with TDMA in a multifrequency environment, identify critical path 
technologies, and assess the potential of developing these technologies to 
a level capable of supporting a practical, cost effective on-board 
implementation. The approach is to incorporate a flexible, high speed, 
digitally implemented Fast Fourier Transform (FFT) 
demultiplexer/demodulator. 

A functional diagram of a complete on-board baseband processor is 
shown in Figure 1.1. The portion of this processor considered for digital 
implementation by this study is outlined in the dashed box. Such digital 
implementation provides flexibility that permits the onboard processor to 
accommodate different types of multichannel FDMA of TDMA/FDMA digital 
service simply by changing its computation rules and organization. This 
can be done from the ground by sending to the onboard processor new 
programing instructions that for example permit one wideband processor 
to demultiplex and demodulate hundreds of narrow bandwidth digital 
carrier channels while another is doing the same thing with tens of wide 
bandwidth digital carrier channels and yet another is doing it with a mix 
of wide and narrowband carrier channels. Of course the rules and 
organization can easily be changed to accommodate variations in the 
service over the lifetime of the satellite or to accommodate different 
applications of the same type of satellite in different locations around the 
earth. This flexibility is the central piece of the concept. 

The objective of the study is to determine the details of digital 
implementation of the demultiplexer and the demodulators and to assess 
the feasibility of constructing such processors in the future. In this 
respect an important part of the effort is a review of the advances that 
can be expected to occur in the important digital component areas in terms 
of size, power, weight, speed and radiation resistivity of the digital logic 
and memory components from which the processor is to be fabricated. 

Also critical technology areas into which R and D should be expended to 
achieve efficient and practical onboard implementation are identified. 
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The processor is envisioned as operating in wideband channels of 
fixed bandwidth similar to that of the transponder channels used in the 
existing satellites. The wideband channel input signals which occur at 
their assigned RF carrier frequency at the front end are down converted so 
that their carrier frequency is at zero Hz at the input to the Demultiplexer. 

A multiplicity of such wideband channels would occupy the spectrum 
assigned to the service. For the purpose of this study, a wideband channel 
bandwidth of 40 MHz has been chosen because it is typical of transponder's 
used in todays satellite systems. The wideband channel signal can be 
sampled in either real or complex form as illustrated in Figure 1 .2. For 
real sampling, the channel is sampled at twice the wideband channel 
bandwidth as shown in part (a) of Figure1.2. For complex sampling, the 
signal is divided into direct and quadrature paths as shown in part (b) of 
Figure 1 .2. In this case the channel sampling rate is equal to the channel 
bandwidth, i. e. 40 Msamp/s. Because complex sampling operates at a 
lower sampling it is easier to implement. Also it is inherently more suited 
to the FFT processing structures that are used extensively in this 
investigation. If the technology permits it, extension to higher sampling 
rates and consequently higher channel bandwidths is obvious. For example, 
processing using channel bandwidths of 80 MHz can be expected in the 
future. 

The down converted baseband is processed by a fonward FFT to 
determine the spectrum distribution within the wideband channel in terms 
of discrete Fourier coefficients . Next, these coefficients are processed by 
a digital filter to select a particular channel and the resulting 
coefficients processed by a demodulator processor to recover the bits of 
the digital signal. Details of the arithmetic and its implementation 
constitute a large portion of the report that follows. The demodulation is 
described in detail for QPSK modulation and extensions for accommodating 
other modulation formats such as offset QPSK and 8-PSK are indicated. 


The report is divided into five sections each covering a major area of 
concern. These are: 

SECTION 2.0 - DEMULTIPLEXER IMPLEMENTATION. 

This section presents the most efficient architecture for the 
implementation of the FFT algorithm and determines the size of the FFT 
that will be sufficient to meet the needs of the demultiplexing processing. 
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SECTION 3.0 - RECOVERY OF THE TIME DOMAIN SAMPLES OF SELECTED 
CHANNELS 


This section develops the rules for realizing the filters that suitably 
separate the communications carriers into their required bandwidths. 

These filters must be flexibly programmable to accommodate a wide 
variation in the number of carriers and their bandwidths. The output must 
be samples in the time domain that are suitable for the demodulation 
processing that follows. 

SECTION 4.0 - DIGITAL DEMODULATOR . 

This section presents the demodulator architecture for extracting the 
baseband digital information from the filtered carriers. This requires 
processing to recover the carrier frequency and phase, the clock frequency 
and phase and the information. 

SECTION 5.0 - TECHNOLOGY SURVEY. 

Based on the detailed processing architecture and requirements 
identified, the current technology has been reviewed from the point of 
view of its ability to meet the need and new technology requiring 
additional development has been identified. In particular the developments 
from the VHSIC program are included. 

SECTION 6.0 - RECOMMENDED DEVELOPMENT PROGRAM. 

Long term development requirements needed to fabricate space flight 
qualified operational hardware are identified. This identifies areas where 
future NASA sponsored research and development can be directed to 
realize a practical cost effective implementation. 


5 
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2.0 DEMULTIPLEXER IMPLEMENTATION 

2.1 DEMULTIPLEXER IMPLEMENTATION WITH A PIPELINE FFT AND AN IDFT 

2.1.1 GENERAL 

The demultiplexer comprises a forward FFT implemented using a 
pipeline architecture which decomposes the input wideband spectrum into 
discrete Fourier coefficients followed by a channel filter that selects 
those coefficients that are in the wanted channel and an IDFT that 
reconstructs the time domain samples from the filtered coefficients. This 
arrangement proves suitable for demultiplexing multiple carriers when 
they all have the same bandwidth, but it consumes too much power for 
demultiplexing carriers of mixed bandwidths. In the section that follows 
this one, it is shown that use of an IFFT followed by an interpolating filter 
for the reconstruction of the time domain samples greatly reduces the 
computational intensity and power required by the sample reconstruction 
process when mixed carrier bandwidths are involved. 

A pipeline architecture is selected as the most efficient way to 
implement the conversion of the wideband input signal into the discrete 
spectrum coefficients needed to demultiplex individual carrier channels. It 
can readily be implemented in hardware which can be operated under 
microprocessor stored program control to make adjustments to change the 
composition of multiple carrier channels demultiplexed. 

The FFT pipeline architecture shown in Figure 2.1 has a number of 
important advantages for the implementation of the onboard 
demultiplexer. These are: 

1) Its pipeline architecture is suited to high speed operation because 
it inherently distributes the processing among many separate processing 
functions. 

2) In contrast with a parallel architecture which may also be able to 
operate at high speed, it requires far less memory (2 to 3 times less). 

3) It yields a compact structure, i.e. one that does not have an 
excessive number of branches and is therefore well suited for hardware 
implementation. 


6 
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FIGURE 2.1. 16 POINT, RADIX 2 PIPELINE FFT 
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In the following, examples are given of the pipeline implementation 
of an FFT processor operating on a 40 MHz wideband multicarrier input 
signal for three cases involving different choices of multicarrier 
composition. These are: 1 ) Demultiplexing 800 narrowband 64kbit/s QPSK 
carriers, 2) Demultiplexing 24 medium bandwidth 2.048 Mbit/s QPSK 
carriers and 3) Demultiplexing a mix of 400 narrowband 64 kbit/s and 12 
medium bandwidth 2.048 Mbit/s carriers. 

2.1.2 EXAMPLE 1, DEMULTIPLEXING OF 800 64KBIT/S CARRIERS 
2.1 .2.1 BASIC PARAMETER SELECTION 

• SAMPLING RATE- 40 MSAMP/S. This rate is established by the 
Nyquist sampling theorem which for a bandpass of W Hz requires W 
complex samples per second. It is assumed that each of the 800 64 kbit/s 
QPSK carriers is assigned to a channel of 45 kHz width. Thus 800 carriers 
would occupy a bandwidth of 36 MHz. To allow for realization of the 
anti-aliasing filter needed to select the occupied spectrum, the processed 
bandwidth must be greater. For this case it is assumed to be 40MHz. Hence 
the sampling rate is 40 Msamp/s. 

• DOWN CONVERSION- Theoretically, it is possible to sample the 
signal directly at its carrier frequency provided that the carrier has been 
passed through the anti-aliasing filter and the sampling pulse width is 
much smaller than a single period of the carrier frequency. At the very 
high frequencies used for satellite transmission, achieving a sufficiently 
short sampling pulse width is impractical and it is necessary to down 
convert the carrier to a lower frequency. Also it is necessary that the 
relationship between the sampling frequency and the frequency at the 
center of the original band being sampled be stable and maintained with 
high accuracy. This requires that the local oscillators for the sampling and 
the down conversion process have high accuracy. Otherwise it will not bo 
possible to maintain the frequency alignment of the individual channels at 
baseband. For the narrow band example considered here, the individual 
channels have a width of 45 kHz and the accuracy should be approximately 
1% of the width or 450 Hz. Relative to an uplink band center of 30 GHz this 
requires individual carrier and frequency conversion accuracy of 6.7x10*®. 
Accuracy for wider bandwidth or lower carrier frequencies is 
proportionately less. 

In the down conversion process, it is important to select a suitable IF 
for implementing a practical sampler and associated anti-aliasing filter.ln 
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the present technology, this is in the range up to 100 MHz with 8 bit 
resolution. The IF can actually be at zero Hz, a choice that eases the 
sampler design since the highest frequency that occurs is half the channel 
bandwidth and the sampling rate is equal to the channel bandwidth. 

• SAMPLING WINDOW AND INPUT COEFFICIENTS- The Nyquist sampling 
theorem requires at least one complex sample per time interval B'^ where 

B is the spacing between the individual carriers being demultiplexed. This 
is one complex sample for each carrier to be demultiplexed in the band 
being analyzed. These are the input coefficients to the FFT processor. For 
the example considered here the number is 40 x 10® / 45 x 10^ » 888.88 
complex samples per window. However, this results in only one spectrum 
sample per channel which is insufficient to accurately represent a 
suitable channel filter. Our simulations indicate that a practical design 
free of operational constraints requires a sixteen fold increase in the 
number of samples and consequently in the width of the sampling window. 

This results in 14222 samples which when rounded up to the nearest 
power of 2 yields 2^^» 16384. To eliminate undesirable consequences of 
circular convolution, an "overlap and save" process in which the overlap is 
50% of the window width is performed. This is done to eliminate the 
first half of the samples which suffer aliasing. 

2.1 .2.2 FORWARD FFT IMPLEMENTATION 

The function of the forward FFT in this application is to obtain 16384 
complex frequency samples in the 40MHz spectrum occupied by the desired 
channels. This results in a window of 409.6}j.s width. To accomplish this a 
single pipeline processor simultaneously performs an FFT on 50% 
overlapping sample windows each containing N=16384 complex samples. 

Hence, the equivalent of 2 pipeline processors are required. These complex 
samples are processed to translate each of the 800 channels to its 
baseband (spectrum centered at a carrier frequency of zero Hz). 

The processing steps foiiow: 

• BUTTERFLY CALCULATIONS- The pipeline processor will perform 
(N/2)log2N - 114688 butterfly calculations for each FFT sample window. 

Each sample window has a duration ■ 1 6384 + 40x1 0® « 409.6 |is. Each 
butterfly requires one complex multiply (4 real multiplies and 2 real adds) 
and two complex adds { 4 real adds) for a total of 4 real multiplies and 6 
real adds. 16 bit precission is assumed. For processing the 114688 
butterfly calculations this yields a total of 458752 real multiplies and 
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688128 real adds for each 409.6 (is window which corresponds to 1 .12 
multiplies per ns and 1 .68 adds per ns. 50% overlap operation doubles 
these rates to 2.24 multiplies and 3.36 adds per ns. 

• DISTRIBUTION OF THE CALCULATIONS- The pipeline FFT processor 
for this example will consist of a cascade of 14 butterfly stages. The 
calculations are equally distributed among these and accordingly the rates 

will be reduced to 1 60 multiplies per ps and 240 adds per ps in each 
stage. These correspond to 6.25 ns per multiply and 4.17 ns per add. Since 
there are 4 real multiplies and 6 real adds per butterfly and if these are 
implemented separately , there is a further rate reduction resulting in 
25 ns per multiply and 25 ns per add. In this case the pipeline processor 
would contain 4x14 - 56 multipliers and 6 x14 - 84 adders. 

• MEMORY REQUIREMENT- As shown in Rgure 2.1 . delay memories are 
required in each stage to achieve proper time alignment of the samples as 
they are processed in the butterflies. There is a single delay of N/2 

complex samples in the first stage and a pair of delays of N/2*^ complex 
samples in each kth stage for 2 ^ k < log 2 N. The total number of real 

samples in the delay memories of the entire pipeline processor is 

k=log2N 

2[ N/2 +2 I ( N/2'^)] » 3N - 4 
k-2 


The above expression yields 3x1 6384 - 4 - 49148 real samples. If each 
sample is 16 bits, the the total memory capacity of the pipeline FFT 
processor is 98.3 Kbytes. There are 27 memories ranging in size from 4 
bytes (one complex sample) to 32768 bytes (8192 complex samples). The 
propagation time in passing through the FFT processing is N/W which for 
this case is 409.6 ps. The memories operate at a 40 Msamp/s rate. 

2.1 .2.3 FREQUENCY DOMAIN PRODUCT. 

The purpose of this operation is to select and shape the spectrum of 
each recovered channel. It is performed by calculating the product of the 
complex samples from the 16384 complex coefficient FFT and a model of 
the channel filter expressed in the form of a set of 1 6 complex 
coefficients selected to represent the desired filter characteristic 
(amplitude and phase). An example of such a filter is shown in Figure 2.2. 
These filter coefficients are selected so that the resulting impulse 
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response is zero in the second half of the 16384 point sampling window. 

This is done to eliminate unwanted aliasing contributions caused by 
circular convolution. A more detailed description of how the channel 
filter coefficients are determined is given in another section. Only those 
1 6 complex FFT coefficients corresponding to the frequency locations of 
thel 6 complex filter coefficients need be considered to demultiplex a 
given channel because all of the other filter coefficients are zero. Hence, 
for each 45 kHZ channel, the 1 6 complex coefficient filter function 
multiplies the 2x16 complex FFT coefficients of the overlapped windows 
having a width of 409.6 ps as illustrated in Figure 2.3 (Two overlapping 
windows need to be processed during each sampling window for each 
channel). Therefore the rate of complex multiplies is 2x16/409.6jis - 
0.0781 /jis which is equivalent to 12.8 ps per complex multiply or 3.2 ps 
per real multiply and 6.4 ps per real add. The result of the frequency 
domain product consists of 16 non-zero complex frequency coefficients 
out of a total of 16384 coefficients occurring for each window. By 
interpreting the frequencies represented by the coefficients to be those 
that are symmetrical about zero Hz for each channel, the channel is 
automatically converted to the desired baseband form. 

2.1 .2.4 INVERSE DISCRETE FOURIER TRANSFORM. 

An inverse discrete Fourier transform is used to convert the complex 
frequency domain coefficients for each channel to the sampled data time 
domain form. An IDFT rather than an I FFT is used because at the input only 
a small number of non-zero coefficients are presented and at the output 
only a small fraction of the samples need to be calculated. The IDFT 
calculation is of the form shown in Figure 2.4 and is performed separately 
for each window. If the full IDFT were determined for each window, the 
result would be 1 6384 time domain samples the first half of which would 
be discarded because they are aliased and from the second half only a 
fraction are needed because of decimation. By anticipating this only those 
coefficients needed will be computed, greatly reducing the computational 
load. Since for each modulated symbol period only two samples are needed 
and these are for half a window and since there are 1 3 32 ksym/s symbol 
periods per 409.6 ps window for each channel for the example being 
treated, the number of calculations per window for each channel is 16 x 13 
» 208 complex multiplies every 409.6ps. Because this calculation must be 
performed for each of the overlapping windows the above calculation rate 
must be doubled to 41 6 every 409.6 ps. The results of the calculations of 
both sets of windows taken together constitute the complex sampled data 
that is to be used for subsequent demodulation of the data signal. 
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2.1 .2.5 ESTIMATE OF THE IMPLEMENTATION POWER REQUIREMENTS. 

The following presents estimates of the power requirements for 
implementing the multiplications involved in the Foward FFT, Frequency 
Multiplication and IDFT functions of an onboard processor to accomplish 
the demultiplexing of the 800 channels of the example being considered. 

The power requirements of the adders is expected to be quite small 
compared to that required for the multipliers. The estimates presented 
are based on present day technology and are expected to be considerably 
better with devices that will be available in the future. 

• FORWARD FFT- This function requires 56 multipliers each operating 
at a rate of one multiply every 25 ns. Toshiba manufactures a 16 x16 bit 
CMOS/SOS VLSI multiplier with an operation time of 27 ns and a power 
dissipation of 1 50 mw. For a guideline it is assumed that this unit can be 
improved to a speed of 25 ns without increased power. Consequently, the 
estimated power needed to implement the FFT multipliers is 56 x 0.15 = 

8.4 w. 

• FREQUENCY MULTIPLIER- This function requires a rate of 0.3125 real 
multiplies/ps for each of 800 channels yielding a total of 0.25 
multiplies/ns or 4ns/ multiply. This rate can be satisfied by using 6 of the 
guideline multipliers which would require a total power of 6x0.15 = 0.9 w. 

•INVERSE DFT- This function requires the determination of 13 time 
domain samples each requiring 16 complex multiplies for each of two 
overlapping windows. Since each complex multiply requires 4 real 
multiplies, the number of real multiplies per channel for each 409.6ps 
window is 16x13x2x4 = 1664 or 4.0625/ps. For 800 channels this 
becomes 3.25 multiplies per ns. This can be satisfied by using 81 of the 
guideline multipliers which yields a power requirement of 12.2 w. 

2.1 .3 EXAMPLE 2. DEMULTIPLEXING of 24 2.048 CARRIERS. 

2.1. 3.1 BASIC PARAMETER SELECTION. 

• SAMPLING RATE- 40 MSAMP/S. This rate depends only on the 40 MHz 
spacing of the transponder and the need to accommodate the anti-aliasing 
filter for realizing an occupied bandwidth of 36 MHz. It is independent of 

the number of channels to be demultiplexed. 

• DOWN CONVERSION- Same as for examplel . 

1 5 


FINAL REPORT: NAS3-24885 


• SAMPLING WINDOW- The same argument given for examplel applies 
to this case with appropriate changes to account for the difference in the 
carriers. For a 2.048 Mbit/s QPSK carrier the practical spacing between 
channels should be 1 .4 times the symbol rate bandwidth yielding a spacing 
between carriers of 1 .4 MHz. In the 36 MHz bandwidth of the transponder, 

24 of these can easily be accommodated. The minimum number of complex 
samples per window needed to represent such channels is 

40x1 0®/1 .4x10® - 28.57. However practical filter design dictates that 
this be increased 1 6 fold to 457 :.nd when rounded up to the nearest power 
of 2 this becomes 2® - 512. 

2.1 .3.2 FORWARD FFT IMPLEMENTATION. 

In this example the function of the forward FFT is to obtain N - 512 
complex frequency samples in the 40 MHz spectrum occupied by the 
transponder. A pipeline FFT implementation is used to accomplish this. 

• BUTTERFLY CALCULATIONS- The pipeline processor will perform 
(N/2)log2N - 2304 butterfly calculations for each FFT sample window. 

Each sample window has a duration - 512 + 40x10® » 12.8 ps. Each 
butterfly requires one complex multiply comprising 4 real multiplies and 6 
real adds. For processing the 2304 butterfly calculations this yields a 
total of 9216 real multiplies and 13824 real adds for each 12.8 ps window 
which corresponds to 0.72 multiplies and 1 .08 adds per ns. 50% overlap 
operation doubles these rates to 1 .44 multiplies and 2.16 adds per ns. 

• DISTRIBUTION OF THE CALCULATIONS- The pipeline FFT processor 
for this example will consist of a cascade of 9 butterfly stages. The 
calculations are equally distributed among these and accordingly the rates 

will be reduced to 160 multiplies and 240 adds per ps in each stage. 

These correspond to 6.25 ns per multiply and 4.17 ns per add. Since there 
are 4 real multiplies and 6 real adds per butterfly and if these are 
implemented separately , there is a further rate reduction resulting in 25 
ns per multiply and 25 ns per add. These rates are the same as those 
calculated in examplel . However in this case the pipeline processor would 
contain 4x9 - 36 multipliers and 6 x 9 - 54 adders. 

• MEMORY REQUIREMENT- Using the same expression developed for the 
memory size in examplel , with N » 512, the memory requirement is 

3x512- 4 ■ 1532 real samples. If each sample is 16 bits, then the total 
memory capacity of the pipeline FFT processor is 3 Kbytes. There are 17 
memories ranging in size from 4 bytes (one complex sample) to 51 2 bytes 
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(128 complex samples). The propagation time in passing through the FFT 
processing is N/W which for this case is 1 2.8 ps. The memories operate 
at a 40 Msamp/s rate. 

2.1 .3.3 FREQUENCY DOMAIN PRODUCT. 

The purpose of this operation is to select and shape the spectrum of 
each recovered channel. It is performed by calculating the product of the 
complex samples from the 512 complex coefficient FFT and a 16 complex 
coefficient model of the channel filter selected to represent the desired 
filter characteristic (amplitude and phase). The filter coefficients are 
selected by the method described in examplel which eliminates unwanted 
aliasing contributions caused by circular convolution. Only those 16 
complex FFT coefficients corresponding to the frequency locations of 
thel 6 complex filter coefficients need be considered to demultiplex a 
given channel because all of the other filter coefficients are zero. Overlap 
and save operation requires that two sets ofl 6 complex frequency 
coefficients be processed for each window. Hence, for each 1 .4 MHZ 
channel, the 16 complex coefficient filter function multiplies the 2x16 
complex FFT coefficients of the overlapped windows having a width of 
12.8 ps. Therefore the rate of complex multiplies is 2x16/12.8ps « 

2.5/ps which is equivalent to 0.4 ps per complex multiply or 0.1 ps per 
real multiply and 0.2 ps per real add. The result of the frequency domain 
product consists of 16 non-zero complex frequency coefficients out of a 
total of 51 2 coefficients occurring for each window. By interpreting the 
frequencies represented by the coefficients to be those that are 
symmetrical about zero Hz for each channel, the channel is automatically 
converted to the desired baseband form. 

2.1 .3.4 INVERSE DISCRETE FOURIER TRANSFORM 

The procedure used is the same as that described for examplel with 
the number of samples per window being 512. Based on the observation 
that for each symbol period only two samples a."a needed and these are for 
half a window and since there are 1 3 symbol periods per window for the 
example being treated, the number of calculations per window for each 
channel is 16 x 13 - 208 complex multiplies every 12.8ps. Because this 
calculation must be performed for each of the overlapping windows the 
above calculation rate must be doubled to 416 every 12.8 ps . The results 
of the calculations of both sets of windows taken together constitute the 
complex sampled data that is to be used for subsequent demodulation of 
the data signal. 
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2.1 .3.5 ESTIMATE OF THE IMPLEMENTATION POWER REQUIREMENTS 

The following presents estimates of the power requirements for 
implementing the multipliers in the Forward FFT, Frequency 
Multiplication and IDFT functions of an onboard processor to accomplish 
the demultiplexing of the 24 2.048Mbit/s channels of the example being 
considered. 

• FORWARD FFT- This function requires 36 multipliers each operating 
at a rate of one multiply every 25 ns. Using the Toshiba 1 6 x16 bit 
CMOS/SOS VLSI multiplier with an operate time of 27 ns with a power 
dissipation of 150 mw as a guideline, the estimated power needed to 
implement the FFT multipliers is 36 x .1 5 - 5.4 w. 

• FREQUENCY MULTIPLIER- This function requires a rate of 10 real 
multiplies/ps for each of 24 channels yielding a total of 0.24 
multiplies/ns or 4.25 ns/ multiply. This rate can be satisfied by using 6 of 
the guideline multipliers which would require a total power of 6x0.15 » 

0.9 w. 


•INVERSE DFT- This function requires the determination of 13 time 
domain samples each requiring 16 complex multiplies for each of two 
overlapping windows. Since each complex multiply requires 4 real 
multiplies, the number of real multiplies per channel for each 12.8|os 
window is 1 6x1 3x2x4 - 1 664 or 1 30/ps. For 24 channels this becomes 
3.12 multiplies per ns. This can be satisfied by using 78 of the guideline 
multipliers which yields a power requirement of 1 1 .7 w. 

2.1 .4 EXAMPLE 3. DEMULTIPLEXING OF 400 64KBIT/S AND 1 2 2.048 MBIT/S 
CARRIERS 

2.1. 4.1 BASIC PARAMETER SELECTION 

• SAMPLING RATE- 40 MSAMP/S. This rate depends only on the 40 MHz 
spacing of the transponder and the need to accommodate the anti-aliasing 
filter for realizing an occupied bandwidth of 36 MHz. It is independent of 
the number, bandwidth and distribution of channels to be demultiplexed. 

Each of the 400 64 kbit/s QPSK carriers is assigned to a channel of 45 kHz 
width in one half of the wideband and each of the 12 2.048 Mbit/s QPSK 
carriers to a channel of 1 .4 MHz width in the other half. However, channels 
of a given bandwidth need not be grouped together because the spectrum 
coefficients of any channel are independently selected. 
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• DOWN CONVERSION- Same as for examplel . 

• SAMPLING WINDOW AND INPUT COEFFICIENTS- It is assumed that the 
same FFT processor processes carriers of both carrier channel widths. The 
narrowband carriers drive the resolution requirement. This being the case, 

the sampling window and number of FFT coefficients processed are the 
same as for example 1 . Therefore the number of samples will be 1 6384 
and the window width 409.6 ps. Processing of narrow bandwidth carriers 
in a given broadband is more computationally intense than wider 
bandwidth carriers. Alternatives for minimizing the computational 
intensity needed for mixed bandwidth situations are treated later. 

2.1. 4.2 FORWARD FFT IMPLEMENTATION 

Since it is assumed that a common forward FFT pipeline processor 
will be used to process channels of different bandwidths, its frequency 
resolution is determined by the narrowest bandwidth channel which is 64 
kHz. Thus, its implementation is the same as that described in example 1 . 

2.1 .4.3 FREOUENCY DOMAIN PRODUCT 

As in example 1 , for each of the 64 Kbit/sec carriers, a 16 complex 
coefficient filter function multiplies the 2 x 16 complex FFT coefficients 
of the over lapped windows. Each 2.048 Mbit/sec carrier, on the other 
hand, occupies a bandwidth 32 times larger than the 64 Kbit/sec carriers 
and therefore a 51 2, { 32 x 1 6), complex coefficient filter function is used 
to multiply the 2 x 512 complex FFT coefficients of the overlapped 
windows. 

2.1. 4.4 INVERSE DISCRETE FOURIER TRANSFORM 

As in example 1 , for each of the 64 kbit/sec carriers, the number of 
calculations per window for each channel is 208 complex multiplies every 
409.6ps. Because this calculation must be performed for each of the 
overlapping windows, the above calculation rate must be doubled to 416 
every 409.6 ps for each narrowband carrier. For the 2.048 Mbit/sec 
carriers, the number of frequency coefficients in each window is 32 times 
larger than for the 64 Kbit/sec carrier. The number of resulting time 
domain samples are also 32 times larger. Thus, 416x32x32 complex 
multiplies are required every 409.6 ps for each wideband carrier. This 
high computationally intensity for the 2.048 Mbit/sec carriers is the 
consequence of mixed bandwidth operation and use of the IDFT. A much 
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more efficient IFFT method is discussed in the next section. 

2.1. 4.5 ESTIMATE OF THE IMPLEMENTATION POWER REQUIREMENTS 

The following presents estimates of the power requirements for 
implementing the multipliers in the Fonward FFT, Frequency Multiplication 
and IDFT functions of an onboard processor to accomplish the 
demultiplexing of 400 64 Kbit/sec carriers and 1 2 2.048 Mbit/sec carriers 
in a 40 Mhz bandwidth. 

• FORWARD FFT- This function requires 56 multipliers each 
operating at a rate of one multiply every 25 ns. Using the Toshiba 16x16 
bit CMOS/SOS multiplier as a guideline, the estimated power needed to 
implement the FFT multipliers is 56 x 1 5 - 8.4 w. 

• FREQUENCY MULTIPLIER- This function requires a rate of 0.3125 real 
multiplies/ps for each of the 64 Kbit/s carriers and a rate 32 times 

larger for each of the 2.048 Mbit/sec carriers. This yields a total of 0.25 
multiplies/ns or 4ns/multiply. This rate can be satisfied by using 6 of the 
guideline multipliers which would require a total power of 6 x 0.15 * 0.9w. 

• INVERSE DFT- For each of the 64 Kbit/s carriers, the number of real 
multiplies per channel for each 409.6 ps window is 416 x 4 » 1664 or 
4.0625/ps. For each of the 2.048 Mbit/sec carriers, the number of real 
multiplies per channel for each 409.6 ps window is 416 x 1024 x 4 - 

1 ,703,936 or 41 60/ps. For the 400 narrow bandwidth carriers and the 1 2 
wide bandwidth carriers, this becomes 51 .545 multiplies per ns. This can 
be satisfied by using 1 289 of the guideline multipliers which yields a 
power requirement of 1 94w. 

2.1 .5 SUMMARY OF SPEED AND POWER 

The results of the demultiplexer implementations for the three 
examples considered in the foregoing are tabulated in Table 2.1 . Clearly in 
the case of mixed size carriers where the ratio of the widest to narrowest 
carrier bit rate and bandwidth is high (32 in our example), the use of the 
IDFT to recover the time samples of the individual carriers from the 
frequency coefficients of the forward FFT is very computationally 
intensive and power consuming. The use of an IFFT followed by an 
interpolating filter is therefore perferred to the use of the IDFT. This will 
be discussed in detail in the next section where the IFFT approach will be 
shown to be much more efficient. 
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TABLE 2.1 

SUMMARY OF MULTIPLIER SPEED AND POWER REQUIREMENTS 
FOR THREE EXAMPLES CONSIDERED 


EXAMPLE 

MULT/ns 

POWER, w 

; 

800 64KBIT/S 

5.74 

21.5 

i 

1 

CHANNELS 



?■ 

24 2.048 MBIT/S 

4.85 

18.0 

1 

CHANNELS 



I 

400 64KBIT/S + 

54.03 

203.3 

r 

12 2.048 MBIT/S 



1 

CHANNELS 





THE RESULTS GIVEN ABOVE ARE BASED ON A WIDEBAND 
MULTICARRIER INPUT SIGNAL OF 40 MHz BANDWIDTH 
A 16X16 BIT MULTIPLER WITH A 25 ns OPERATE TIME 
AND A POWER DISSIPATION OFISOmw 
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2.2 DEMULTIPLEXER IMPLEMENTATION WITH A PIPELINE FFT AND IFFT FOR 

MULTIPLE BANDWIDTH CARRIER OPERATION 

2.2.1 GENERAL 

For the onboard demultiplexer/demodulator to be fully flexible and 
useful, it must be able to demultiplex multiple carriers of different 
bandwidths. The discussion of the previous section revealed that although 
the use of the IDFT is suitable for recovering multiple carriers all of the 
same bandwidth, it is excessively computationally intensive and power 
consuming for use with multiple carriers of mixed bandwidths in the 
wideband signal being processed. To overcome this difficulty, it is best to 
use an IFFT , implemented using the pipeline approach, in place of the IDFT. 

The resulting computational intensity is significantly reduced. It is also 
influenced by the choice of implementation of the fonward FFT. 

This section addresses three ways to accomplish the multiple 
bandwidth operation which vary with regard to the forward FFT: a Single 
Large FFT Processor, a Cascade FFT Processor and a Parallel FFT 
Processor. These implementations are described in the following and a 
comparison is made of their relative performance in terms of the number 
of complex multiplications needed to process a block of 16384 (2^^) 
complex time domain input signal samples. This number is determined by 
the narrowest bandwidth to be processed. It is assumed that the 
processor is to demutiplex an input signal spectrum containing 512 
narrowband carriers in one half of the spectrum space and 16 wideband 
carriers in the the other half of the spectrum space, where each wideband 
carrier has a width equal to 32 narrowband carriers. The extension to 
accommodating more bandwidths is obvious. Carriers of a certain 
bandwidth may be grouped together in each half of the spectrum or they 
may be in disconnected groups distributed arbitrarily in the spectrum. The 
comparison is based on the number of multiplications required for each 
FFT and IFFT and must be doubled to account for "overlap and save" 

The results show that the Single Large FFT Processor transforms all 
carriers to baseband with the least number of multiplications. 

2.2.2 SINGLE LARGE FFT PROCESSOR 

The single large FFT processor is illustrated in Figure 2.5 and the 
number of complex multiplications required in the various steps of 
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processing are given in Table 2.2. Each step is numbered in the figure and 
the table. The first processing step is to calculate an FFT that is 
sufficient to provide a resolution that supports the narrowest bandwidth 
carriers expected. In the example considered, this narrowest bandwidth is 
determined by allocating 1 024 carriers in the input signal band and for 
each it is assumed that the narrowband processing channel filter can be 
suitably realized using 16 frequency coefficients, thus yielding 16 x 1024 
- 1 6384 frequency coefficients. The input signal is sampled in complex 
form at a rate of W where W is the width of the spectrum assigned to the 
composite of carriers to be demultiplexed. The complex samples are 
presented to the FFT processor in blocks of 16384 and the duration of a 
block is 16384A/V. The number of multiplies needed to perform this FFT is 
approximately (N/2)log2N where N is the number of coefficients. The 

resulting number of complex multiplies for step 1 is 1 14,688 for N - 
16384. 

The processing represented by steps 2 and 3 in Table 2.2 and Figure 
2.5 convert selected subsets of FFT coefficients corresponding to the 
frequency locations of the narrowband channels into the complex signal 
basebands of 512 narrowband carriers. This is done by multiplying the FFT 
coefficients by the 1 6 frequency coefficients of the channel filter and 
performing an IFFT for each of 51 2 narrowband carriers. This requires 8 x 
4 ( (N/2/)log2N, N =16) multiplications for each filter, yielding a total of 

16,384 complex multiplies. Next, 8 time domain samples resulting from 
each of the 50 % overlapping sample blocks must be interpolated to derive 
samples aligned with the symbols of the digitally modulated carrier. This 
interpolation requires 8 multiplies for each complex sample and the 
number of samples is the product of the number of IFFT samples and the 
ratio of the bandwidth W to the symbol rate R. This latter ratio is 
assumed to be 4/3 for typical QPSK modulated carriers. Thus, the 
interpolation of the samples requires 8 x 8 x 4/3 complex multiplications 
for each of the 512 narrowband channels. This yields a total of 43, 691 
multiplications for each block of 1 6384 samples. 

In a similar manner the wideband processor recovers the basebands of 
the 1 6 wideband carriers in steps 4 and 5. Since these filters are 32 
times wider than the narrowband filters, they will contain 1 6 x 32 « 51 2 
FFT coefficients for each wideband channel. The FFT coefficients 
corresponding to each wanted channel location are multiplied by the 512 
coefficients of the channel filter representing the wideband filter. The 
resulting frequency coefficients are converted to time domain samples by 
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an IFFT which requires 256 x 9 (( N/2)Iog2N with N=512) complex 

multiplications for each of the 16 channels processed yielding 16 x 256 x 
9 = 38,864 multiplications. This is followed by interpolation processing 
of the 256 time domain samples produced by the IFFT to generate 4/3 x 
256 samples properly aligned with the symbols of the digitally modulated 
carrier. Since each interpolated sample requires 8 complex 
multiplications, a total of 1 6 x 8 x 256 x 4/3 « 43,691 complex 
multiplications are required for each block of 16384 samples. 

The net total of complex multiplications required to process each 
block of 16384 input complex samples to recover the basebands of 512 
narrowband and 16 wideband channels assumed in the model analyzed is 
255,318 as given in Table 2.2. The wideband and narrowband carriers can 
be located anywhere in the input signal band. Two possible arrangements 
are illustrated in Figure 2.5. 

2.2.3 CASCADE FFT PROCESSOR 

The configuration of the cascade FFT processor for accomplishing 
demultiplexing of carriers of two different bandwidths is shown in Figure 
2.6. The concept is to first process the input signal into the wide bands in 
step 1 . Those carriers having the narrow bandwidth are processed by a 
256 coefficient IFFT in step 2 to convert them back to time domain 
sampled signal form. These time domain samples are selected from 32 
blocks each of two streams of 50% overlapping blocks yielding a block of 
8192 time domain samples which are converted to 8192 frequency domain 
coefficients by the FFT processor of step 3. The latter are multiplied by 
the 16 coefficients of each of the 512 narrowband filters and these are 
converted to the 51 2 basebands by the 1 6 coefficient IFFT and the sample 
interpolation processing performed in steps 4 and 5. Those carriers having 
the wide bands are processed directly to their basebands using the IFFT 
and associated sample interpolator represented by processing steps 6 and 
7. 


The number of complex multiplications needed to accomplish each 
step are tabulated in Table 2.3. Note that the input wideband FFT has only 
512 coefficients as determined by the bandwidth requirement compared to 
the 16384 coefficients for the narrow bands. This is a ratio of 32:1 . Thus 
when converting to the FFT needed for the narrowband filters, 32 blocks of 
the input FFT processor output are aggregated to form one block for the 
narrowband processor. In Table 2.3 this fact is indicated in the column 
titled "replications per 16384 samples". The number of complex 
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multiplications required for each step are tabulated in the rightmost 
column of the table. The logic used to arrive at these numbers is the same 
as that previously described for the single large FFT processor and is not 
repeated here. The total number of complex multiplies needed to convert 
512 narrowband and 16 wideband channels for the cascade FFT processor 
is 279,894 which is greater than that needed for the single large FFT 
processor. 

Because the narrowband carriers are processed in bundles of 32 which 
equal the width of the wideband carrier channel, the flexibility to adjust 
their locations in the input signal spectrum is limited to bundles of 32. 

2.2.4 PARALLEL FFT PROCESSOR 

The configuration of the parallel FFT processor for demultiplexing 
carriers of two different bandwidths is shown in Figure 2.7. The concept 
provides a separate processor for each bandwidth accommodated. For the 
narrowband carriers a 1 6384 coefficient FFT is used in step 1 , followed by 
a 1 6 coefficient IFFT and sample interpolator in steps 2 and 3. For the 
wideband carriers a 512 coefficient FFT is used in step 4, followed by a 
1 6 coefficient IFFT and sample interpolator. The number of complex 
multiplications required for each step is given in Table 2.4. The total 
number required for processing 512 narrowband and 16 wideband carriers 
is 308, 576 which is greater than either of the other methods described 
above. This result is not surprising since the other methods share a 
common input FFT processor while the parallel method requires a separate 
input processor for each bandwidth accommodated. 

2.2.5 GENERIC PROCESSOR 

Each figure appearing in the text illustrates two example 
distributions of the wide and narrowband channels. Virtually any 
arrangement of the channels can be accommodated with only minor 
additional calculations required to perform frequency translations 
between the output of the input FFT and the inputs to the narrowband and 
wideband IFFT processors respectively. In this discussion, only two 
bandwidths have been considered. In an actual processor many more 
bandwidths can be accommodated with very little change in the number of 
multiplications required since the same number of input samples are 
shared among all processors and each operates at a rate dictated by its 
share of the total signal spectrum. Furthermore, each processor can be 
given an amount of processing power sufficient to accomplish its most 
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difficult task and be reprogrammed to perform any lesser task. Thus the 
unit may contain a number of generic processors that can be programmed 
after launch and reprogrammed during their life to accommodate differing 
demands. An example of this is seen in the single large FFT processor for 
which 16,384 + 43691 - 60,075 multiplications are required for 
narrowband channels and 38,864 + 43,691 « 82,555 multiplications are 
required for the wideband channels. A generic processor having the 
greater capability can do either job. For instance, if a 14 stage pipeline 
processor is available and only a 9 stage FFT is needed, then the last 5 
stages can be inhibited by microprocessor control. 
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TABLE 2.2 

NUMBER OF MULTIPLICATIONS FOR 
A SINGLE LARGE FFT PROCESSOR 
PER 16384 COMPLEX TIME DOMAIN SAMPLES 
(DOUBLE VALUES FOR OVERLAP AND SAVE) 


PROCESSOR REPLICATIONS 

TYPE PER 16384 SAMP 

COMPLEX 

MULTIPLIERS 

TOTAL 

COMMON FFT: 




1) 16384 COEFF. FFT 

1 

8192 X 14 

114,688 

51 2 NARROWBAND 
CHANNELS: 




2) 512 X 16 COEFF. IFFT 

1 

512x8x4 

16,384 

3) 512X8x8x4/3 INTERP. 

1 

512x8x8X4/3 

43,691 

16 WIDEBAND 
CHANNELS: 




4) 16x512 COEFF. IFFT 

1 

16X256X9 

36,864 

5) 1 6 X 8 X 256 X 4/3 INTERP. 

1 

16 X 8 X 256 X 4/3 

43.691 



GRAND TOTAL 

255,318 
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TABLE 2.3 

NUMBER OF MULTIPLICATIONS FOR 
A CASCADE FFT PROCESSOR 
PER 16384 COMPLEX TIME DOMAIN SAMPLES 
(DOUBLE VALUES FOR OVERLAP AND SAVE) 


PROCESSOR REPLICATIONS COMPLEX TOTAL 

TYPE PER 16384 SAMP MULTIPLIERS 


COMMON FFT: 


1)512COEFF. FFT 

32 

32 X 256 X 9 

73,728 

i 

512 NARROWBAND 




i 

CHANNELS: 




t 

2) 256 COEFF. IFFT 

32 

32 X 128 X 8 


32,768 




i 

3) 8192 COEFF. FFT 

1 

4096 X 13 

53,248 

1 

4) 512x16 COEFF. IFFT 

1 

512x8x4 

16,384 

1 

5) 512X8x8x4/3INTERP. 

1 

512x8x8X4/3 

43,691 

; 

16 WIDEBAND 




f 
■ §- 

CHANNELS: 




- 

6) 16 X 16 COEFF. IFFT 

32 

32 X 16X8X4 

16,384 


7) 16X8X8X4/3 INTERP. 

32 

32 X 16x8x8x4/3 

43.691 




GRAND TOTAL 

279,894 

f 
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TABLE 2.4 

NUMBER OF MULTIPLICATIONS FOR 
PARALLEL FFT PROCESSOR 
PER 16384 COMPLEX TIME DOMAIN SAMPLES 
(DOUBLE VALUES FOR OVERLAP AND SAVE) 


PROCESSOR REPLICATIONS COMPLEX TOTAL 

TYPE PER 16384 SAMP MULTIPLIERS 



1) 16384 COEFF. FFT 

1 

8192 X 14 

114,688 


2) 512x16 COEFF. IFFT 

1 

1 X512X8X4 

16,384 


3) 512X8X8X4/3INTERP. 
16 WIDEBAND 

1 

512X8X8X4/3 

43,691 


CHANNELS: 

4) 512 COEFF. FFT 

32 

32 X 256 X 9 

73,738 


5) 16x16 COEFF. IFFT 

32 

32x16x8x4 

16,384 


6) 16X8X8X4/3 INTERP 

32 

512x8x8x4/3 

43.691 




GRAND TOTAL 

308,576 

— 
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2.2.6 POWER ESTIMATES FOR THE FFT-IFFT IMPLEMENTATION 

Tables 2.2, 2.3 and 2.4 present the number of complex multiplications 
required in processing a window of 16, 384 samples through a FFT, an IFFT 
and an interpolating filter. As mentioned in a previous section the use of 
an IFFT followed by an interpolating filter is more efficient than using an 
IDFT when carriers of widely varying bandwidths are to be demultiplexed. 

To obtain power estimates from Tables 2.2-2.4 proceed as follows. 
Assuming a 40 MHz bandwidth (including the guardbands at the edges) and 
a 40 MHz sampling rate, a window of 16384 time samples has a duration 
1 6384+(40x1 0®) = 409.6 ps. During 409.6 ps, two windows must be 
processed because of the overlap operation. Thus the grand totals shown 
in Tables 1-3 represent the number of complex multiplications in 409.6/2 
= 204.8 ps. With 4 real multiplications per complex multiplication and 
using the guideline multiplier of 25 ns and 150 mw, we obtain the 
following estimates. 

For the single large FFT processor (Table 2.2), the number of 
multipliers required to perform the demultiplexing and interpolation 
functions become: 

25531 8x4x25+(204.8x10^) = 125 multipliers 

Using150 mw/multiplier, the net power dissipation is: 

• Large FFT Processor Power » 125x0.150 «18.8w. 

The above number represents the estimated power required to perform the 
necessary multiplications in the demultiplexing and interpolation 
processes. As we mentioned earlier, the power required for the additions 
is a small fraction of the power required for multiplications. Therefore 
the above figure is representative of the total computational power 
required in the demultiplexing and interpolation functions. 

A similar calculation for the cascade FFT processor (Table 2.3) leads to: 

• Cascade FFT Processor Power - 137x0.15 » 20.5w 
and for the parallel FFT processor (Table 2.3) to : 

• Parallel FFT Processor Power = 151x0.15 * 22. 6w. 
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2.2.7 SUMMARY 

Three methods for implementing the demultiplexer to accommodate 
carriers of different bandwidths have been studied. The method which 
uses 

a single large forward FFT processor followed by an IFFT for individual 
channel selection results in the least computational intensity and power 
consumption to perform the overall processing for all carriers. This 
method also has unlimited flexibility for accommodating various 
arrangements of carrier locations and bandwidths in the input signal band. 
Because of these desirable properties, it is the preferred method chosen 
for further consideration. Compared to the power estimate for the 
FFT-IDFT implementation given in the previous section which required 
over 200 w any of the three methods using the IFFT discussed in this 
section consume far less power. 
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2.3 COMPARISON OF RADIX 2 AND RADIX 4 FFT IMPLEMENTATIONS 

2.3.1 GENERAL 

This section presents a comparison of the radix 2 and radix 4 pipeline 
implementations of the FFT. It is concluded that the radix 4 
implementation causes an increase in the number of multipliers and adders 
by factors of 1 .5 and 1 .83 compared to the radix 2 implementation while 
reducing the speeds of the individual multiplies and adds by factors of 
0.75 and 0.91 67. The radix 4 implementation would therefore be of 
interest only if the speed of multiplication becomes a limiting factor. 
Othenwise, the radix 2 design would be preferred. 

2.3.2 NUMBER OF STAGES 

In the implemention of the FFT, the pipeline architecture can be 
expressed as a cascade of Discrete Fourier Transforms (DFTs) and the 
lowest order transform that is conceivable is the 2x2 or radix 2 DFT. 

When the pipeline FFT is implemented using the 2x2 DFT, it is referred to 
as a Radix 2 FFT. This implementation was previously described in an 
earlier section. For a sample window containing N samples, the number of 
radix 2 pipeline stages is given by the expression 

^^RADIX 2 “ '°92^ 

In a radix 4 implementation, the DFT processes a 4x4 subset of samples 
and consequently for a sample window of N samples, the number of radix 4 
pipeline stages is given by the expression 

^RADIX 4 “ '°94N “ (1/2)log2N 

Thus the radix 4 implementation halves the number of pipeline stages 
needed relative to the radix 2 to perform the FFT. A block diagram of a 
radix 4 pipeline implementation for a 64 sample window is shown in 
Figure 2.8. 

2.3.3 COMPUTATION SPEED 

With regard to the speed of computation, each radix 4 stage has twice 
as long to perform its processing and consequently operates at half the 
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rate of the radix 2 stage. Since there are one half the number of stages 
and each operates at one half the rate, the speed of computation is cut to 
one fourth that of the radix 2 implementation. 

2.3.4 NUMBER OF COMPUTATIONS PER STAGE 

Diagrams of the radix 2 and radix 4 computational elements (also 
called butterflies) of each stage are shown in Figure 2.9 for comparison. 
Each radix 2 butterfly comprises 1 complex multiply and 2 complex adds 
which in turn require 4 real multipliers and 6 real adders, whereas each 
radix 4 butterfly comprises 3 complex multipliers and 8 complex adders 
which in turn require 12 real multipliers and 22 real adders. Thus, the 
total number of real multipliers and real adders for each radix are 

No. of RADIX 2 Adders = 6 log 2 N 
No. of RADIX 4 Adders » 1 1 log 2 N 
No. of RADIX 2 Multipliers = 4 log 2 N 
No. of RADIX 4 Multipliers = 6 log 2 N 

From the above it is seen that the number of adders and multipliers needed 
for the radix 4 implementation exceed those needed for the radix 2 
implementation; however, the influence of speed has yet to be accounted 
for. The clock speed of the radix 2 design which processes 2 samples at a 
time is thus 1/2 the sample rate while that of the radix 4 design which 
processes 4 samples at a time is 1/4 the sampling rate. Consequently the 
rates of adds and multiplies for the radix 2 and radix 4 implementations 
assuming a sampling rate of R per second are respectively, 

RADIX 2 add speed - 3.0 R log 2 N 
RADIX 4 add speed - 2.75 R log 2 N 
RADIX 2 mult speed » 2.0 R log 2 N 
RADIX 4 mult speed - 1 .5 R log 2 N 
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2.3.5 SUMMARY 

From the above discussion comparing the radix 2 and radix 4 pipeline 
FFT implementations, it can be concluded that: 

1 . The radix 4 compared to the radix 2 implementation increases the 
number of multipliers by a factor of 1 .5 and the number of adders 
by a factor of 1 .833. This increases the size of the overall 
processor accordingly. 

2. The radix 4 compared to the radix 2 implementation decreases the 
speed of the multipliers by a factor of 0.75 and that of the adders 
by a factor of 0.9167. 

Use of a radix 4 implementation is of interest if the speed of the 
multipliers becomes the limiting factor. Otherwise the radix 2 design is 
preferred. 
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3.0 RECOVERY OF THE TIME DOMAIN SAMPLES OF SELECTED CHANNELS 

3.1 GENERAL 

To recover a given carrier from the 40MHz band processed by the input 
FFT, it is necessary to calculate the product of the FFT coefficients and 
the coefficients of a channel filter defining the bounds of the wanted 
channel that are stored in onboard memory. The FFT processing required to 
obtain the spectrum coefficients of the input multicarrier signal has been 
discussed in the previous sections. The method used to obtain the 
coefficients of the channel filter is now discussed and this is followed by 
a discussion of the processing used to recover the time domain samples 
needed at the input to the demodulator. The discussion is presented in 
terms of the recovery of multiple 1 .024 Msym/s rate carriers each 
carrying 2.048 Mbit/s which from the previous discussion requires a 512 
point FFT over a 40 MHz spectrum allocation. 

To accommplish recovery of the samples, first the forward FFT 
coefficients must be filtered by a channel filter to select the wanted 
components and next an interpolation filter must be applied to calculate 
the properly phased time domain samples needed at the demodulator input. 
The time domain samples delivered at the output of the IFFT processor are 
timed relative to the clock that controls the demultiplexer and this clock 
is established by the wideband signal sampler located at the input to the 
forward FFT. The time domain samples that are used in the demodulator 
are established by the need to sample the carrier signal appearing at the 
input to the demodulator at twice the symbol rate. Furthermore, the phase 
of the samples must be adjusted according to a phase control signal from 
the demodulator to align the samples at the proper positions in each 
symbol. These points will become clear in the discussion of the 
demodulator which comes in a later section. To accomplish this, a sample 
interpolator is needed between the demultiplexer and the demodulator. 

The discussion concludes with the description of an IFFT method 
recently identified by Comsat Labs that is still in the process of being 
developed more fully. This method promises to provide a means for 
simultaneously performing the IFFTs of a multiplicity of carriers of 
different sizes in the same pipeline processor. As it is currently, the 
pipeline processor must be reprogrammed for each different bandwidth 
processed and simultaneous processing of different bandwidths requires 
parallel pipeline IFFTs. 
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3.2 CHANNEL FILTER FREQUENCY COEFHCIENTS 


First, a frequency domain transfer function of the baseband channel 
filter is selected. Typically, this may be a 40% square root Nyquist for the 
symbol rate selected. The transfer function is sampled in such a way that 
the 40 MHz band is covered in 256, (1/2 x 512), equally spaced frequency 
domain points. Most of the samples will be zero because the wanted 
channel only covers a small fraction of the total spectrum. Next, the 
inverse transform is performed over the 256 frequency domain points to 
produce a 256 sample time domain impulse response of the filter. The next 
step is to add 256 zeros to extend the impulse response to a length of 512 
time domain samples and perform a 512 point Fourier transform which 
results in a 512 sample frequency domain transfer function. This is the 
frequency domain function which performs the interpolation among the 
samples needed to satisfy the conditions of the overlap and save method 
for removing the unwanted aliasing samples of circular convolution. The 
interpolation process leads to non-zero frequency coefficients outside the 
desired bandwidth which are small (< - 40 dB ) and may be set to zero 
without introducing significant error. The resulting channel filter function 
is stored in memory and used to multiply the 512 coefficients of each 
signal spectrum to recover the frequency domain samples of each carrier 
channel. 

3.3 INVERSE FOURIER TRANSFORM 

Following multiplication of the output of the FFT by the channel 
filter's frequency coefficients (which are stored in RAM) there will be 512 
frequency points (only a few of which are non zeros) representing a 
particular carrier. This process is repeated for all carriers by choosing 
the part of the FFT spectrum where each carrier is located and multiplying 
it by the corresponding filter's coefficients. What remains then is to 
invert those frequency coefficients on a carrier by carrier basis. There 
are several methods to perform this inverse operation. 

a) IDFT Method 

The first method consists of computing the desired time samples one 
at a time using the inverse DFT relationship 

x(tj) - 2 
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where x(tj) is the desired time samples at tj, are the frequency 

coefficients at k » 0,1 ,2, . . and c is a constant. Only the non zero frequency 
coefficients need be included in the above sum.The time instants tj at 

which samples need to be computed are obtained from the clock 
synchronizer. 

Two samples per symbol are adequate for detection and 
synchronization. For detection, the samples should be at the middle of the 
symbols (maximum eye openings), these are assumed to be the even 
samples. To maintain synchronization an additional set of samples is 
needed at the zero crossings when symbol transitions occur (minimum eye 
openings), these are assumed to be the odd samples. Therefore, the time 
instants at which samples should be computed are separated by half a 
symbol duration. Clock adjustment is performed by an acquisition and 
tracking procedure described in the section on demodulation. Within each 
block, the time domain samples in the first half of the block should be 
discarded as dictated by the overlap and save technique. This is because 
this first half suffers from the aliasing arising from the circular 
convolution. 

The advantage of doing the inversion one sample at a time as 
described above is that only the samples that are needed are computed. 
Thus the aliased samples are not computed at all. The number of 
multiplications per output sample increases linearly however as the 
carrier size (number of non zero frequency coefficients) increases. 

b) IFFT Method ( Non Power Of Two) 

In contrast, if an inverse FFT (IFFT) operation is performed on the set 
of non zero frequency coefficients, the increase would be logarithmic, 
which is slower. This leads to a second approach for inverting the 
frequency coefficients. As we mentioned above the time samples required 
are separated at half a symbol intervals. To obtain precisely these 
samples at the output of the IFFT would require that the frequency 
coefficients used in the transform span a frequency range exactly equal to 
twice the inverse of a symbol duration. In general, this will imply a 
noninteger number of frequency points since the frequency resolution and 
the inverse of a symbol duration are not simply related. Although the 
error resulting from rounding to the nearest integer may be acceptable, 
the size of the resulting IFFT will not in general be a power of two. 
Algorithms for non power of two Fourier transforms exist and could be 
used. Powers of two Fourier transforms are preferred; however, because 
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they have a simpler control structure, 
c) IFFT Method (Power Of Two) 

The third approach that is now presented uses powers of two Fourier 
transforms. In this third approach of inverting the frequency coefficients 
to recover the time domain samples, an IFFT whose size is a power of two 
is used. The chosen power of two is the smallest power of two that is 
larger than the number of non zero frequency coefficients. (Later in this 
section, we shall discuss how several IFFTs of different sizes can be 
implemented simultaneously in a single pipeline.) As shown in Figure 3.1 , 
the samples at the IFFT output will not correspond to the desired even and 
odd samples and therefore an interpolation process will be required. The 
interpolation filter must be chosen such that the combined filtering of the 
demultiplexing filter and the interpolation filter approximate the desired 
square root Nyquist response. 

3.4 CHOICE OF THE SAMPLE INTERPOLATION FILTER. 

The interpolation filter is used to weigh the samples generated at the 
IFFT output to determine the properly phased samples needed at the input 
of the demodulator. Its coefficients must be chosen jointly with those of 
the channel filter 

Three different ways for choosing these filters are shown in Figure 
3.2 and discussed below. 

Case A represents use of the desired square root Nyquist at the 
demultiplexer output and a brick wall filter for the interpolation. This is 
not a good choice due to the difficulty (large computational requirements) 
in implementing a brick wall filter which theoretically has an infinte 
impulse and is hence impractical. 

Case B shows the square root characteristics equally divided among 
the demultiplexing and the interpolating filter. This approach is preferred 
to A but is still not very attractive because of the sharp characteristics 
of the fourth root Nyquist function which results in a very long impulse 
response. 

In Case C, a square root Nyquist filter is used at the demultiplexer as 
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in Case A. However, a larger size inverse FFT is used. The effect of this, 
as shown in Figure 3.2, is that the interpolation filter characteristics can 
now be flat over the range of frequencies where the demultiplexer filter 
response is non zero and have a smooth transition to zero over the range of 
frequencies where the demultiplexer filter response is zero. Doing so 
simplifies the interpolation process considerably. Indeed, simulation 
results show that only a few (at most 1 6) samples need be used in the 
computation of any desired interpolated sample. The impulse response 
coefficients of the interpolating filter would be stored in memory. The 
number of coefficients to be stored depends on two factors. The first one 
is the number of symbols over which the impulse response is non zero. As 
mentioned above, this number is minimized by choosing a smooth 
frequency characteristic. The second factor is the accuracy needed in 
subdividing a symbol interval. Simulation results show that having 32 
samples per symbol interval, i.e., being able to compute the sample value 
at any of 32 equally spaced locations within a symbol interval, is quite 
adequate. This would correspond to storing no more that 256 coefficients 
of the impulse response. 

3.5 LINEAR AND CIRCUUR INTERPOLATION 

3.5.1 LINEAR INTERPOLATION. 

The first option which is called linear interpolation consists of the 
following steps illustrated in Figure 3.3: 

1 . At the output of each I FFT frame, select the time domain samples 
corresponding to the carrier under consideration. 

2. For each I FFT frame, discard the first half of the samples 
corresponding to the carrier under consideration. 

3. Juxtapose the second half from frame N to the second half from 
frame N-1 and so on to form a contifiuous stream of samples. 

4. Use this stream as the input to the interpolating filter and 
compute the output interpolated samples at the time instants indicated by 
the clock synchronize output. 

3.5.2 CIRCULAR INTERPOLATION. 

The second option which is called circular interpolation consists of 
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the following steps which are illustrated in Figure 3.4: 

1 . At the output of each IFFT frame, select the samples corresponding 
to the carrier under consideration. 

2. Arrange the samples corresponding to the carrier of interest at the 
output of each IFFT frame in a circular manner (i.e. as if they constituted 
one period of a periodic signal). This is simply implemented by numbering 

the samples 0,1,2 N-1 and using a module N operation. Thus sample N 

would be sample 0, sample N+1 would be 1 and so on. 

3. Use these samples as the input to the interpolating filter and 
sample the output samples at the indicated time instants. 

The circular approach to interpolation is preferred to the linear 
approach because each frame is processed independently of the previous 
ones leading to a simpler implementation with less storage requirements. 
IFFTs Of Different Sizes In The Same Pipeline Processor 

3.6 IFFTs OF DIFFERENT SIZES IN THE SAME PIPELINE PROCESSOR 

Several IFFTs of different sizes can be implemented simultaneously 
in a single pipeline. At every clock pulse, r samples are presented to the 
butterfly computational elements. The twiddle factors (phase shifts) used 
with the butterfly operations will depend on the FFT size, the stage within 
the pipe, and the index of the input samples. Those twiddle factors are 
precomputed and stored in memory. At every clock pulse, a new factor 
may be used, thus accommodating a variety of FFT sizes. Of course, the 
Interstage reordering will have to properly match the samples before 
presenting them to the next butterfly element. These interstage 
reordering modules consist of delays and commutators as mentioned 
previously. The amount of delay at a given stage in the pipeline is 
determined by the stage number. However, more flexibility in the 
commutator action is needed to implement different output/input 
matching at every clock pulse. The commutator action can be greatly 
simplified by properly sequencing the different frequency coefficients of 
the different carriers. Detailed circuit designs and timing diagrams for 
such an implementation are being developed under corporate sponsorship 
at COMSAT LABS. 
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4.0 DIGITAL DEMODULATION 

4.1 OVERVIEW 

This section describes a digital signal processing method for 
demodulating the individual carrier signals that are demultiplexed by a 
combination of FFT and IFFT processing. The signals are presented to the 
demodulator in the form of discrete time domain samples at a rate of two 
samples on each of two quadrature channels for each symbol interval. 
These samples are processed to recover the modulated data bits. To 
accomplish this, it is necessary to acquire and maintain both symbol 
timing and carrier frequency synchronization. A single processor is shared 
to demodulate all of the carriers. 

A block diagram of the demodulation processor is shown in Figure 4.1 . 
Two samples per symbol, X|^ and Y|^ are derived at the output of a sample 

interpolator at a rate of two per symbol and controlled in phase by a 
timing estimate s^^ that maintains an alignment such that one sample 

occurs at the center of each symbol and the other at each symbol boundary. 
This process also compensates for the slip between the FFT/IFFT and 
demodulator clocks. The sampling interpolator establishes the proper 
sample phase as the final step in the IFFT processing. 

Symbol timing acquisition and synchronization are performed by the 
processors contained in the loop shown at the top of Figure 4.1 . The 
acquisition process calculates an initial estimate of the timing phase 
error, during the preamble segment of each received TDMA burst. This 

is used to initialize an accumulator in the symbol synchronizer at the 
start of the traffic portion of the burst. The symbol synchronizer 
maintains the timing error to a value near zero during the traffic portion 
of the burst by appropriately adjusting the value of s'^p. For continuous 

carriers the acquisition function may be replaced by a timing search 
procedure. 

Carrier acquisition and synchronization are performed by the 
processors shown in the lower half of the Figure 4.1 . The acquisition 
processor calculates initial estimates of carrier phase, and phase 

rate, , ( carrier frequency offset) during the preamble. These 
are used to initialize the carrier synchronizer which maintains the 
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synchronization during the traffic portion of the burst. The output of the 
coherent demodulator consists of the samples taken at the center of each 
symbol which are designated as the even numbered samples and those 
taken at symbol boundaries which are designated as the odd numbered 
samples. When symbol timing and carrier synchronization are properly 
maintained, the even numbered samples are taken at the optimum time ( at 
mid symbol) to cancel out intersymbol interference an consequently 
provide the best possible sample values for making the bit decisions. The 
odd numbered samples occur at the boundaries between symbols and are 
consequently nearly zero when symbol transitions occur and at an absolute 
maximum when no transitions occur. Only the even numbered samples are 
used by bit decision processor to determine the estimated bit outputs A'^ 
and B^. Decision directed feedback from the output of the bit decision 
processor is used in the carrier synchronization processing to aid in the 
calculation of the carrier phase error. 


Detailed descriptions of the various processing steps for the 
acquisition and synchronization phases for symbol timing and carrier 
recovery are given in the following. 


4.2 ACQUISITION PROCESSING 
4.2.1 PREAMBLE STRUCTURE. 


2V(2C)sin( 27rtR /2) sin( co t +0 


0 - r 



FIGURE 4.2. CONVERSION TO BASEBAND 


Let the preamble be represented by a BPSK modulated carrier of 
power C having quadrature baseband components: 
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X - V(2C) sin( 2;itRg/2 ) cos 0 
Y » V(2C) sin( 2jitRg/2 ) sin 0 


(la) 

(lb) 


where Rg is the symbol rate, t is time, 0 is the phase offset between the 

signal carrier and the recovered carrier. For digital signal processing, the 
continuous function must be represented in discrete sampled data form. 
The Nyquist sampling theory shows that each symbol of each quadrature 
phase must be sampled twice. Samples are taken at times t|^+At where At 

is the time error in locating the desired sampling instant. Hence, the 
sampled data form can be expressed as: 


X|^ = V(2C) sin( 2nt|^Rg/2 + e/2 ) cos 0 

(2a) 

= V(2C) sin( 2Kt|^Rg/2 + e/2 ) sin 0 

(2b) 


where e/2 is the phase displacement between the signal symbol period and 
the sampling clock period. If the time error is At, then e = 2itRgAt. The 2 

samples per symbol are classified into those taken at even and those at 
odd numbered sampling times. If the timing error At - 0, the even 
numbered sampling times are at mid symbol and the odd numbered at 
symbol boundaries. For the nth symbol, the even numbered samples are 
taken at times \2n times ^2n-'\ 


t 2 n - (n +1/2)Tg 

(3a) 

^2n-1 “ "'•'s 

(3b) 

Tg * 1/Rs 

(3c) 


Consequently, the sampled values for odd and even numbered sampling 
times during the preamble are: 
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odd -Yo-^(2C)(-1)"sin( e/2) sine 

(4a) 

odd ''(2C) (-1 )" s/2) cose 

(4b) 

even - ^e - ''(2C) (-1)" cos( e/2) sine 

(4c) 

■even " ^e - ''<2C) (-I)" cos( e/2) cose 

(4d) 


The values of X|^ and Y|^ taken at times tj^ « k^2Rg are: 
For X,^ 


W2C cos 0 sin(e/2) + noise 

k-1 ,n-1 

-V2C cos 0 cos(e/2) + noise 

k»2,n-1 

V2C cos 0 sin(e/2) + noise 

k«3,n-2 

V2C cos 0 cos(e/2) + noise 

• 

k=4,n=2 

• 

repeats every 4 samples 



ForYk 


W2C sin 0 sin(e/2) + noise 

k=1 ,n-1 

-V2C sin 0 cos(e/2) + noise 

k»2,n»1 

V2C sin 0 sin(e/2) + noise 

k.3,n.2 

V2C sin 0 cos(e/2) + noise 

• 

k-4,n-2 

• 

repeats every 4 samples 



( 5 ) 


( 6 ) 


Thus only 8 different values actually occur which are repeated every two 
symbols. This result is evident in the sampled preamble shown in Figure 
4.5. From these samples we wish to estimate values of 0 and e. 

The sampled values of the preamble signal given above can be 
combined into the following relationships among the even and odd 
numbered samples: 
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SUMXodd- 2 ^X 0 - Xl-X3*>'5-X7 

(7a) 

s- 

SUMXg^g„.Zxe - X 2 -X 4 + X 6 -X 8 ■ 

(7b) 

W 

f: 

1 

SUMYodd - lyo - Y,- Y 3 + Y 5 -Y 7 

(7c) 

#• 

E 

j 

i- 

SUM Ye„a„ . lYe - Yj- Y 4 + Yg - Yg • 

(7d) 

i 

p 

1 

substituting the sample values given in equations ( 1 ) and ( 2 ), recognizing 


1 

that they repeat in sets of four 


1 

ZXo “ *^(2C) N COS0 sin(e/2) 

( 8 a) 

1 

ZXe = -^(2C) N COS0 cos(e/2) 

( 8 b) 

i 

Zyo “ *^(2C) N sin 0 sin(e/ 2 ) 

( 8 c) 

1 : 

Zye = -^(2C) N sin 0 cos(e/2) 

( 8 d) 

f 

From these relationships, expressions can be written for carrier and 


i 

symbol timing acquisition. 



4.2.2 CARRIER ACQUISITION ( DETERMINATION OF 0 AND d 0 /dt) 


1 

4.2.2. 1 Determination of 0 . 


E 

S' 

F 

f 

From the expressions previously given for Zyo ^Ye* following 


1 

can be written: 


i 

ZyQ^ + Zyg^ = C N^ sin ^0 

(9a) 

i 

f- 

2 ^Xo^ + Sxe^“ CN^cos^e 

(9b) 

f 

f 

tan^e = ( Zyo^ + lYe^ W ^Xo^ - ^xe^ ) 

(9c) 

I 

T 
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The value of 0 determined from the above expression is limited to the first 
quadrant consequently resulting in a four fold ambiguity. This can be 
reduced to a two fold ambiguity by examining the sign of the expression 

^XYI " ^Xe^e + ^Xo^Yo C®) 

For noiseless conditions, equation (8) shows that Zye" ^^Xe ^Yo“*^^Xo' 
Substituting these last expressions into equation (10) the result is 

£xyi - K[(Xxe>^+ (2:xo)^l ('1) 

Thus the sign of the function Z^yi same as that of K and determines 

the value of K as being either greater than or less than zero. If K is 
greater than zero the angle is in the first or third quadrant and if it is 
negative the angle is in the second or fourth quadrant. 



FIGURE 4.3. AMBIGUITY REMOVAL DIAGRAM FOR CARRIER PHASE. 


4.2.2.2 Determination of 0*'^ = E(d0/dt). 

To determine the rate of change of carrier phase (this corresponds to 
determination of the frequency offset between the carrier and the local 
reference applied during the acquisition phase), the preamble is divided 
into halves and the value of 6 calculated separately in each half. Let the 
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value in the first half be 0^ and in the second half 02. The calculation 

process inherently determines each estimated value at the center of each 
half. Hence the estimated value of the derivative is 

0’^- E(d0/dt) - 4(02* 0i)Rg/N (12) 

and the estimated value of the phase angle at the end of the preamble is 

0^^ - E(0) - (02 + 01 )/2 + (02 - ©1 ) « (302- 0i )/2 (1 3) 



FIGURE 4.4. DETERMINATION OF INITIAL CARRIER PHASE AND PHASE RATE 

4.2.3 CLOCK ACQUISITION (DETERMINATION OFe^ = E(e) and e*^ = E(de/dt). 
4.2.3. 1 Determination of E. 

IXo^ + Zyo^ - (CN^M) sin2e/2 

Zxe^ + ^Ye^ " (CN^/4) cos^e/2 

tan2e^/2 - ( Zxq^ + Vi ^xe^ + ^Ye^ ) 

The value of e determined from the above expression is limited to the first 
quadrant. The symbol phase has a two fold ambiguity, lying either in the 
range (0 < e/2 < n/2) or {-nJ2 < eJ2< 0) which correspond to the ranges (0 < 

E < 7t) or (-71 < E < 0 ) respectively. This ambiguity can be eliminated by 
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examining the sign of the expression: 

^XY2 “ ^Xe ^Ye + ^Xo ^Yo 

If ^xY2 zero, the angle e is in the interval (0 < e < ti) and if 

less than zero it is in the interval (-n < e <0). 

An estimate of the derivative e*^ is obtained by the same method 
used previously to estimate , i.e., by dividing the preamble into 
halves, calculating an estimate for e in each half and dividing the 
difference by half the duration of the preamble. This leads to the 
following expressions for the expected values of e and de/dt extrapolated 
to the end of the preamble. 

e*^ - E(de/dt) - 4 (e 2 - )Rs/N (1 5) 

and the estimated value of the phase angle at the end of the preamble is 

E ^ » ~ ( e 2 + )/2 + E 2 ~ ■ ( 2 e 2 “ e .^)/2 (^ 6 ) 

4.2.4 INITIALIZATION OF THE TRACKING PROCESSING. 

As a result of the acquisition processing just described, the 
estimated values of the recovered carrier phase offset carrier 
frequency offset 0*^, symbol timing phase offset e'^ and the symbol 
frequency offset e*^ have been determined at the instant marking the end 
of the preamble and the beginning of the traffic. Although an expression 
has been derived above for e*^ , it is not used in the symbol tracking 
processing. These values are installed as the initial values in the tracking 
process. This causes the carrier phase to be established with a two fold 
ambiguity still to be resolved by examination of the polarities of 
quadrature modulated UWs , and the carrier frequency, symbol phase and 
symbol frequency to be established within the margin of error determined 
by the noise conditions. The resulting symbol phase adjustment will be 

* The UW actually can resolve four phase ambiguity if needed. 
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r 


I r 


such as to cause the even numbered samples to occur at the center of the 
symbol period and the odd numbered samples at the symbol period 
boundaries as illustrated in Figure 4.5. 

even values odd values 




• — I — I — I — I — I — I — I — I — I — I — I — I 

1 2 3 4 5 6 7 8 9 10 11 12 13 

k ► 

FIGURE 4.5. LOCATIONS OF ODD AND EVEN NUMBERED SAMPLES 
RELATIVE TO THE PREAMBLE SYMBOL SIGNAL. 

4.3 SYNCHRONIZATION - TRACKING PROCESSING. 

4.3.1 QPSK MODULATED SIGNAL REPRESENTATION. 

The QPSK signal can be represented by the relationship 

Q(t) = V(2C) cos( co^jt + Oj, - >•) (1 7) 

where C is the carrier power, co^ the carrier frequency, 0^ the carrier 

phase and X the modulation angle. Depending on the modulating 
information, the angle X can assume the values it/4, 37t/4, 5 ti/ 4 or 7;i/4. 
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These angles result from the assumption that the modulated signal is the 
sum of two quadrature signals, A cos co^^t and B sin o)j,t where A represents 

the bits of the message transmitted on the X channel and B the bits 
transmitted on the Y channel. A and B take on the values ± 1 to represent a 
zero or a one. The resulting signal phases are shown in Figure 4.6. 

B 

A B 

1 1 

1 1 

1 -1 

1 -1 

FIGURE 4.6. QPSK CARRIER MODULATION PHASES 

In terms of the modulation angle X it is evident that A and B can be 
expressed as 

A»V2cosX (18a) 

B-V2sinX (18b) 

Consequently, the relation for the modulated signal can be expressed as 

Q(t) » VC [ A cos ( o)^,t + ) + B sin ( oa^t + 6^ )] (19) 

This signal is quadrature demodulated by multiplying it by cos ( oa^t + 6^) 

to recover the X channel and sin ( co^t + 9^) to recover the Y channel and 

recovering the low passed difference frequency components. The 
recovered low passed signal can be expressed as a vector: 
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Z- V(2C) ei + -X + jY (20) 

where 

X - V(2C) cos ( 0 + X) ■ VC [ A cos 0 - B sin 0 ] (21a) 

Y = V(2C) sin ( 0 + X) » VC [ B cos0 + A sin 0] (21b) 

The expressions given above are for continuous representation. For 
digital demodulation implementation, it is necessary that the signal be 
represented in discrete sampled data form. To represent the quadrature 
modulated information content, it is sufficient that each of the two 
phases be sampled twice during each symbol interval of duration Tg = 1/Rg 

with the samples equally spaced. For optimum recovery of the information 
assuming Nyquist filtering, one sample should occur at mid symbol and the 
other the end of the symbol where the transition to the next symbol 
occurs. Sampling at mid symbol is optimum with Nyquist filtering because 
at the instant of sampling all intersymbol interference contributions are 
theoretically zero and in the practical case certainly nulled. 

During the preamble, A « B and the modulation is a binary alternating 
sequence of +1 and -1 values. Hence, transitions of ti radians occur at 
each symbol boundary. When the resulting signal is quadrature 
demodulated to a lowpass band of width slightly greater than Rg/2, the 

resulting quadrature signal appears to be a sinusoid of frequency Rg/2. 

This feature has already been discussed in the section devoted to 
acquisition processing. 

if sampling takes place with a timing offset of At relative to the 
symbols, the corresponding phase offset at the frequency of the symbol 
rate Rg is e - 27iAt/Tg - 27U^tRg. Assuming that when bit transitions 

occur, i.e. Bp — Bp.-j , the low passed transition signal is approximated by 
sinusoid shaped pulses of half period Tg, the sampled values are given by: 
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X|^ - V(2C) sin{ 7tRgt|^ + e/2 ) sin( 0 + nlA) 

Y|^ = V(2C) sin{ iiRgtj^ + e/2 ) cos( 6 + Ji/4) 

Samples can be classified as even or odd depending on sampling times 
expressed as follows for the nth symbol. 

For even sample times: t 2 n - {n+1/2)Tg 

For odd sample times: t 2 n_i « nTg 


(22a) 

(22b) 


(23a) 

(23b) 


4.3.2 TRACKING OF SYMBOL TIMING AND CARRIER FREQUENCY. 

4.3.2.1 Symbol Timing Tracking. 

As a consequence of the acquisition process, the offset between the 
symbols of the modulated signal and the sampling times is determined and 
at the end of the preamble is administered as the initial value to start the 
tracking process. During tracking, this offset is maintained such that the 
odd numbered samples are at the locations of transitions and the even 
numbered samples at the center of the symbol period. This is 
accomplished by using the output Z ■ X + jY from the coherent demodulator. 
Since the timing error is very small, even numbered sample values are not 
greatly affected by small timing error; however the odd numbered sample 
values are approximately proportional to the small timing error. 

From the previous discussion regarding the preamble signal, it was 
demonstrated that the baseband signal recovered from a BPSK carrier is a 
sinusoid of period 2Tg and in the vicinity of the axis crossings which mark 

the transitions from one symbol to the next, the value of the odd numbered 
sample is ± V(2C) sin e/2 where the sign depends on whether the 
transition is positive or negative. In the traffic portion of the QPSK 
modulated TDMA carrier burst, the signal occurring on either quadrature 
channel will be the result of modulation by binary signals that reverse 
phase arbitrarily at symbol boundaries depending on information content. 
When there is no change in the modulation value, the odd samples will 
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yield about the same value as the even samples. A modulation value 
change ( known as a symbol transition) causes a zero crossing in the 
vicinity of the odd sampling time. Modulation changes occur when Ap, 

■-Ap,.^ or Bp • -Bp.i (a phase reversal transition occurs on either or both 

channels). The situation is illustrated in the Figure 4.7. It can be assumed 
that the signal function during the transition is a sinusoid similar to that 
for transitions experienced for simple BPSK. Hence the expressions for 
the signals on the X and Y channels for such transitions are: 

Y(t) - [(Bn-Bn.i)/2]['/C sin ( jcR^t)] (ud-nTj) - u(t-(n+1)T5) 

X(l) - [(A„-A„.i)/2]['iC sin ( sRjt)] [u(t-nTs) - u(t-(n+1)T5) 

where u(t-nTg) and u(t-(n+1)Tg) are unit step functions. 

When a transition occurs for the nth symbol for the odd numbered samples 
on the X and Y channels which are displaced by an error e, the resulting 
relationships for the odd numbered samples are: 

Y2n-1 - l(Bn-Bn-lV21 sin e/2 


X2n-1 -KWlV2HCsine/2 


Modulation transitions are identified by the conditions Ap= -Ap_^ and/or 
Bp= -Bp_.j at the A and B decision outputs of the demodulator. 

If a transition is detected, the output at the odd numbered sampling 
instant corresponding to the symbol responsible for the transition is 
approximated by the above relationships. Thus, decisions on A and B can be 
used to convert the odd numbered samples of X and Y to estimates of the 
symbol timing error that can be used for symbol synchronization. These 
principles are used below in association with a first order phase lock loop 
to track symbol timing during the traffic portion of a TDMA burst. The 
same principles can be used to acquire phase but with less rapidity than 
the acquisition method previously described. 


(24a) 

(24b) 


(25a) 

(25b) 
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As illustrated by equations (25 a&b), the samples on both quadrature 
components taken at odd numbered sample times have a magnitude that is 
proportional to e/2 for small timing phase errors when transitions take 
place. When no transition occurs the odd sample times have large values 
given approximately by 


'•^2n-1 * ®n ®n ” ®n-1 


^2n-1 ■ ^n ^n “ ^n-1 


(26a) 

(26b) 


When the odd sample time values are multiplied by the polarity of the first 
difference of the detected bit decisions, the resulting values always have 
the same sign and the sign reverses between lagging and leading 
conditions. Furthermore, since transitions occur only when there is a bit 
reversal the method eliminates the contributions duo to large sample 
values which would othenwise destroy the desired property. This is 
illustrated in Figure 4.7. 

The transition detector is implemented by determining the first 
differences of the decisions made at the output of the decision detector. 
This decision process is represented by the following expressions: 


Q"2n-1 

- (B"p- BVi)/2 

(27a) 

P"2n-1 

1 

C 

< 

< 

1 

c 

< 

< 

H 

(27b) 


The values of these expressions are given by the following logic table for 
Q^2n-1 ® function of B'^p and : 

n- 1 



1 

-1 

1 

0 

1 




n 



-1 

-1 

0 
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A similar table exists for P^2n-1 ® function of A'^p and A'^p.^ . 

Once each symbol the product of transition detector output and the odd 
numbered samples yields the following value for the error estimate: 

e^n • [Q^2n-1 ^2n-1 + P^2n-1 ^2n-ll^^^ (28) 

which can be expected to have a value 

- IQ''2n-1 + P^2n-1 M ^ (29) 

When no transition occurs on either channel the error value is zero and 
consequently produces no contribution to the correction process. 

4.3.2.2 Symbol Synchronizer Operation 

The symbol synchronizer is shown in Figure 4.8. It consists of a 
phase detector which obtains estimates of the phase error s^p every 

symbol period followed by an amplifier of gain GTg which in turn is 

followed by an accumulator which sums the amplified phase error 

estimates. The output 

of the phase detector for the nth symbol is 

® n “ (®n - ® n)^*^s (80) 

where Sp is the received symbol phase and s'^p the currently estimated 

symbol phase. The output timing phase is updated by the accumulator to 
yield: 


s^'n - ®Vl + GTg e^P (31) 

The corrected sampling signal phase estimates, s'^p, are supplied to the 
interpolator stage of the IFFT where they are used to adjust the phase of 
the sampling clock, hence sampling times, so that the value of e'^p is 
driven to values that meander about zero with small magnitude. 
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FIGURE 4.8. SYNCHRONIZER FOR SYMBOL TRACKING 
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The noise bandwidth and loop bandwidth B|_ of the discrete PLL are 
related to the gain G by 

Bfg ■ 2Bj_ a G/2 (32) 

Consequently, the averaging time is approximately 

te - 2/G (33) 

which can be expressed in terms of symbols as 

"e - V^s - 2/(G Tg) (34) 

The variance in the estimate of e obtained during each symbol interval 
averages 

Oi^- 2/(Eg/NQ) (35) 

Consequently, the variance over the smoothing interval t^ (with n^ 
symbols) averages 


f^e “ 2/ [ng {E^%)] (36) 

As a typical application, assume that GTg is set to yield a value of n^ 
that results in a tracking error of Tg/100 when Eq/Nq - 4 corresponding to 
6 dB. Then since e » 2n At/Tg 

Og2. (271)2(0.01)2 -1/253 (37) 

With Eg/NQ - 4, the averaging time in terms of symbols is 

hg- 253/2 -127 (38) 

and consequently the gain of the discrete PLL is 
GTg -2/ng - 0.0157 
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4.3.2.3 Carrier Phase Tracking 

During the traffic data portion of a TDMA burst, a sampled data 
second order phase lock loop shown in Figure 4.9 is used to maintain 
carrier synchronization. A second order loop contains two accumulators, 
one calculating estimated carrier phase 0'^ and the other estimated carrier 
phase rate (dG/dt)'*', which of course is frequency. These accumulators are 
initialized by the estimates of phase and phase rate obtained from the 
preamble acquisition processing. 

Because the synchronization has been acquired, the even numbered 
samples are located very near the mid symbol position. Under these 
circumstances, the actual carrier phase is 0 and the estimated carrier 
phase is 0^^ and there is a small phase difference <> = 0 - 0^ between them, 
Under these conditions the quadrature modulation components for the nth 
symbol can be expressed in terms of ^ by 


Y2n “ ( Bp cos ^ + Ap sin 0 ) 

(41a) 

1 ; 

^2n = ( Ap cos (t> - Bp sin <t> ) 

(41b) 

! g 

When d =0 the cross coupling between the channels becomes zero. 


1 ! 

Consequently, the binary decisions on the samples Y2p and X2p should be 


1 ; 

very reliable estimates of the modulation variables Ap and Bp. Hence, 


1 1 

Y"2n = B-p 

(42a) 

S 1. 
r 

= ; 

X"2n - A^p 

(42b) 

1 

Substituting the above relations into the following decision feedback 


j : 

cross product relation 


1 i- ■ 

F(n) - (X^2n ^^2n ' ^^2n ^2n)^2VC 

(43) 

X : 

«■ 1 

1 j- 

i I 


yields the result: 
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FIGURE 4.9. CARRIER PHASE TRACKING LOOP 
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F(n) - (ApA-^n + BnB^n)(sin4>)/2 + - B^^ An)(cos(D)/2 (44) 

Consider the average of the above expression over a relatively large 
number of symbols. Cross product terms A'^p Bp and B'^p Ap average to 
zero since the bits comprising the information on the quadrature are 
randomly related. Co product terms ApA^p and BpB'^p each average to 1 

over the same averaging interval. There will be residual variance in the 
cross and co product terms which depends on the length of the averaging 
interval and contributes to error in the estimate. Hence, provided the 
averaging interval is sufficiently long, 

F(n) - sin <t>^p - <t>^p (45) 



I ^ 





I 


4.S.2.4 Carrier Synchronizer Operation. 

The estimated value of the phase error, determined by the phase 
error detection method described above is used to generate a new 
estimate, 6^p^.i . of the carrier phase by means of the 2nd order discrete 
phase lock loop shown in Figure 4.9. For each new value of the phase error 
<t»^P the first summation loop generates an output Sp which is given by the 

expression 


Kid) 


(46) 


the differential term 6*^p - (A6/At)^ p is the current estimate of the 

phase rate which is the frequency offset between the actual carrier and 
the recovered carrier. The phase rate accumulator inside the first 
summation loop also computes a new phase rate estimate 

e-Vl-e‘VKl(K2Ts)<t>% (47) 


Sp is passed on to the second accumulator where it is summed with the old 
value of the phase to generate a new value according to the relation 


n-1 


+ Bp Tg 
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When the phase lock loop is ideally locked, the phase rate estimate 0*^^ 
is equal to the frequency difference between the actual and recovered 
carrier causing 6'^^ to advance by an amount Tg for each symbol 

which is the precise amount needed to maintain the error estimate 
equal to zero. 

The value of thus determined is supplied to the carrier phase 
corrector which is implemented as shown in Figure 4.10. This processor 
rotates the phase of the recovered carrier by 0^p keeping it aligned with 
the phase of the signal carrier. 

Performance of the phase lock loop can bo expressed in terms of two 
parameters, the damping coefficient ^ and the natural undamped frequency 
tOy . These are related to the loop gain parameters K.| and K 2 by the 

expressions; 


C - N(Ki/K2)]/2 (49) 

-'J(K,K2> (50) 

The effective noise bandwidth B|sj of the phase lock loop, which is twice 
the low pass bandwidth B[_, is given by the expression 

BN-2BL-(cOy/2)(2^ +1/20 (51) 

If a value C “ 1/2 is selected the following relationships result; 

Ki « K 2 ■ cOy - B|yj (52) 

The loop carrier to noise ratio which determines the standard deviation in 
the recovered phase estimate is 

C/N|l=(Es/No)(Rs/Bn) (53) 


where Eg/N^ is the symbol energy to noise spectral density ratio and Rg is 
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the symbol rate. A convenient relation that results from the above 
expression is 


^n'^’s = (^s^^o)^ I L (54) 

or 

Bn - . Kg - cOy - Rg (Eg/Ng)/ C/N | l (55) 

Typically, to obtain a standard deviation of 3.2° in phase C/N 1 1 _ must be 
160. Furthermore, assuming Rg - 10® sym/s and Eg/Ng - 6 dB, 

Bn - - Kg - oiy -(4/160) 10® -25600 (56) 

This result is for a damping coefficient ^ — 1/2. Other values will result 
for other values of the damping coefficient. 

4.4 COMPUTATIONAL REQUIREMENTS. 

4.4.1 SYMBOL TIMING AND CARRIER ACQUISITION. 

For acquisition of symbol timing and carrier phase and frequency, the 
preamble is divided into halves each containing Ng/2 symbols. For each 

half the following number of multiplications must be performed: 

1 ) Ng/2 additions for each Zxo- ^xe- ^Ye* totaling 2Ng. 

2) 1 multiplication for each x^y^. XoY„,totaling 6. 

3) 1 addition for each X^^+Xg^, Y^Z+y^Z, Xo^+Y^^, Xg2+Vg2, XeYg+X^Y^, 

XoYe+XeYo, totaling 6. 

4) 2 inverse tan operations implemented using PROMs. 

Thus, the total requirement for the entire preamble of Ng symbols for each 
TDMA burst is 4Ng + 12 additions, 12 multiplications and 4 inverse tan 
operations. 
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4.4.2 SYMBOL AND CARRIER TRACKING. 

a) Symbol Tracking. 

Symbol tracking, also referred as clock synchronization, requires the 
following: 

1) 2 additions for every odd numbered sample to compute 

0^2n.1 - (B"n- BVlV2 

P^2n-1 - *Vl)/2 

Since these involve values of only ±1 they can be performed by logic 
and don’t count. 

2) 2 multiplications and 1 addition every odd numbered sample to compute 

e^n = [Q^2n-1 '*'2n-1 ^2n-lV^^ 

The multiplications involve values of ±1 and don't count. 

3) 1 multiplication and 1 addition every odd numbered sample to compute 

s'^n “ ®"^n-1 ^^s ^"^n 

Thus a total of 2 additions and 1 multiplication are needed for each odd 
numbered sample to track the symbol timing. For a burst containing M 
traffic segment symbols there are M odd samples yielding a total 
requirement of 2M additions and M multiplications for each burst. 

b) Carrier Tracking 

Carrier tracking, also referred as carrier synchronization, requires 
the following: 

1) 2 multiplications and 1 addition for each even numbered sample (hence 
for each symbol) to compute 

F(n) - (X"^2n '''"2n ' 

2) 2 multiplications and 3 additions per symbol to update the carrier 
phase and frequency estimates as follows: 

Sn + (|>^n Tg - Tg + Tg (|> ^ ^ 
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= 0 " 


n-1 


SnTs 


e' 


n-1 


-SnTs 


3) 4 multiplications and 2 additions per symbol for carrier phase rotation 
to compute 

X|^cos0'^n'\sine^n ^nd Y,^ cosG^n + ■ k-2n-1,2n 

Thus a total of 8 multiplications and 6 additions are needed for each 
symbol to perform the carrier tracking processing. 

4.4.3 TOTAL DEMODULATOR REQUIREMENT. 

The total requirement for processing the symbol timing and carrier 
acquisition and tracking is summarized in TABLE 4.1. 

TABLE 4.1. 

SYMBOL TIMING AND CARRIER ACQUISITION AND TRACKING 
COMPUTATIONS REQUIREMENTS PER SYMBOL 

COMPUTATIONAL MULTIPLIES/SEC ADDITIONS/SEC 
REQUIREMENT 

SYMBOL TIMING & 

CARRIER ACQ. ^2/Ng 4 + 12/Ng 

SYMBOL TIMING & 9 8 

CARRIER TRACK 


From the above, the following relation can be derived for the 
computational requirement to process a shared TDMA carrier having a bit 
rate of Rjj among a community of TDMA terminals: 

MULTIPLIES/SEC - (Rb/2)(12 +9M)/(M+Ng+G) 

ADDITIONS/SEC - (R^j/2)(4Ng + 12 + 8M )/(M+Ng+G) 


where G is the number of symbols allowed for guard time between bursts 
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and each terminal is assumed to transmit a burst having a preamble N_ 
symbols long and a traffic segment M symbols long. 

For example consider a TDMA system having an average burst such 

that: 

Rjj « 120.832 Mbit/s 
Ng ■ 128 symbols 

M « 1 2288 symbols ( 24 64kb/s channels, 8ms frame) 
G - 16 symbols 

The resulting computational rates are: 

MULTIPLIES/SEC = 538 x 10® 

ADDITIONS/SEC = 480 x 10® 

4.4.4 INTERPOLATION REQUIREMENT 

Interpolation is performed on the output samples generated by the 
IFFT. It introduces the symbol timing correction and generates the 

samples X|^ and Y|^ that comprise the input to the demodulation process. 

The interpolation computational requirement is based on an interpolation 
filter with an impulse response that extends 4 symbols in each direction. 

This requires 16 multiplications for each sample on each quadrature 
channel yielding a total of 64 multiplications for each symbol. Thus the 
interpolation requirement is: 

INTERPOLATION REQ. « 64 x Rg multiplications/sec 

For a composite rate of 120Mbit/s ( Rg - 60Msamp/s) this is 3.866 x 10® 
mult/sec. 

It is important to point out that the 1 20 Mbit/s TDMA example 
discussed above represents operation in a broadband channel of 80 MHz 
width and would not require either FFT/IFFT or interpolation processing if 
it were the only carrier to be processed. It is only when the signal to be 
processed is a composite of many different carriers that these latter 
processing elements are used. 
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4.5 DEMODULATION OF BPSK, 8-PSK AND OFFSET-QPSK 

4.5.1 GENERAL 

This section describes the operation of a completely digital 
demodulator for BPSK, 8-PSK and offset-QPSK. Because of the strong 
similarity with the QPSK demodulator, which was previously described in 
great detail, the description here is abridged. The presentation indicates 
the differences compared with the QPSK demodulator thereby avoiding 
duplicating a large body of identical material. 

4.5.2 BPSK DEMODULATION 

4.5.2. 1 Acquisition Processing. 

The acquisition process is identical to that used in QPSK both for the 
carrier and the clock. 

4.5.2.2 Tracking Processing. 

The tracking loops are identical to those used for QPSK but the 
estimates fed to these loops are slightly different. Referring to Equation 
28, there is a similar equation here, except that Q'^ » 0 since only 1 bit is 
transmitted per symbol in BPSK. The error estimate at the input of the 
clock loop is therefore: 

- P''2n-1 (57) 

Similarly, in Equation 43 Y'^2n ” ® estimate at the input of 

the carrier loop is: 


F(n) - X^2n ^an /2VC (58) 

It is clear from the above that the differences between QPSK and BPSK 
demodulators are very minor and that a QPSK demodulator can be easily 
modified via microprocessor control to demodulate BPSK and vice-versa. 

4.5.3 8-PSK DEMODULATION. 

4.5.3. 1 Acquisition Processing. 
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The acquisition process is identical to that used in QPSK both for the 
carrier and the clock. 

4.5.3.2 Tracking Processing. 

The tracking loops are also identical to the ones used for QPSK as is 
the error estimate supplied to the carrier loop. Only the clock error 
estimate is different because of the multiphase nature of the 8-PSK 
signals. This manifests itself when a transition occurs from one octal 
symbol to another. The transitions on the X and Y channels are no longer 
simple zero crossings but several transition levels are possible. Referring 
to Figure 4.1 1 and denoting the estimates for symbols n-1 and n, on the X 
channel as A^^.^ and respectively (and similarly and p for 
the Y channel), yields 

which are the estimated transition levels, and 

P-n = (A-p-A-p.i)/2 

Q"n = (B"n-BVlV2 


which are half the transition magnitudes. 



FIGURE 4.1 1 8-PSK TRANSITION ON THE X CHANNEL 


(59) 

(60) 

(61) 

(62) 


79 


FINAL REPORT: NASS 24885 


Next, form an error estimate based on transition detections properly 
weighed to assign more weight to large transitions as follows: 

- (X„ - + (Y„ - N^„) (Tn (63) 

This error estimate is then fed to a first order digital loop identical to 
that used for QPSK. Finally the decision rule is different from QPSK and is 
easily implemented. From the above discussion on 8-PSK it is concluded 
that with little effort it is possible to modify a QPSK demod via 
microprocessor control to demodulate 8-PSK signals. 

4.5.4 OFFSET-QPSK DEMODULATION 

4.5.4. 1 Acquisition Processing 

The preamble for offset-QPSK must be different than the QPSK 
alternating preamble, otherwise acquisition fails. This can be 
demonstrated as follows: 

For the alternating preamble and due to the half symbol offset on the 
Y channel, the transmitted signal during the preamble has the form: 


X (t) * sin TiRgt 

(64) 

Y (t) - cos KRgt 

(65) 


After mixing at the receiver's oscillator, which has a phase offset 0 
and with a clock misalignment e/2, the following results: 

X - V(2C) [ sin(7tRgt + e/2) cos 0 - cos (jcRgt + e/2) sin 0] (66) 

Y- V(2C) [ sin(nRgt + e/2) sin 0 + cos (TiRgt + e/2) cos 0 ] (67) 

Using well known trigonometric identities, the above equations may be 
rewritten as: 


X = V(2C) sin(:iRgt + e/2- 0) (68) 

Y = V(2C) cos(TcRgt + e/2- 0) (69) 
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Thus (e/2- 0) can be determined but e/2 and 0 cannot be determined 
separately and thus the acquisition process fails. Therefore a different 
preamble must be used. 

A suitable alternative is the alternating 45°, -45° sequence provided by: 


Ap » 1 (constant) (70) 

Bp - (-1)^ (alternating) (71) 

For the even samples this yields: 

X 2 p = V(2C) [ cos 0 - (-1 )^ cos e/2 sin 0 ] (72) 

Y 2 n = ^(2C) [ sin 0 + (-1 )'’ cos e/2 cos0 ] (73) 

and for the odd samples we get 

^2n-1 * ^(2C) [ cos 0 - (-1 )'^ sin e/2 sin 0 ] (74) 

Y 2 n-i = "^(2C) [ sin 0 + (-1 )'^ sin e/2 cos 0 ] (75) 


For carrier acquisition simply add all the Y samples over the first half of 
the preamble and similarly for the X samples and obtain the arctan of the 
ratio of the Y sum over the X sum. Do the same for the second half of the 
preamble and then proceed as for the QPSK case. 

For clock acquisition, begin by determining the mean signal value at 
the nth symbol interval from the 4 complex samples over this and the 
preceding symbol as follows: 

^X,n ” ^2n"‘' ^2n-1 ^2n-2 ^2n-3^ “ “^(20) cosGp (76) 

my n • (1/4)(Y2n+ Yjn-i + '<2n-2 * '^2n-3> - ''(2C) sinSn 

Note that the means must be calculated for each value of n because in the 
presence of a frequency offset ( 0* 0), 0 would vary with n. 

Next subtract the means from the original samples and obtain new 
quantities as follows: 
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P2n = ^2n ‘ ^X,n * ^(^C) (-1 )" cos e/2 sine (78) 

^2n “ ^2n ■ '^Y.n = ^(2C) (-1)^ cos e/2 cos0 (79) 

P2n-1 * ^2n-1 ' (-1)" sin e/2 sine (80) 

P2n-1 “ ''^2n-1 ' ^Y,n * ^^^C) (-1)^ sin e/2 cose (81) 


Next, proceed with the above p and q samples in the same way as with the 
X and Y samples for QPSK. 

The preprocessing given above is needed to remove the samples means 
for offset-QPSK compared to QPSK due to the different nature of the 
preamble. Once the sample means are removed the remainder of the 
processing parallels that of QPSK. 

4.S.4.2 Tracking Processing 

After acquisition has been achieved, tracking proceeds as for QPSK 
after the X samples are delayed by a sample to give coincident alignment 
with the Y samples. 

From the above discussion the tracking processing for offset-QPSK is 
almost identical to that of QPSK. The acquisition processing on the other 
hand needs some preprocessing after which it proceeds in the same wav as 
for QPSK. 

4.5.5 SUMMARY 

The overall conclusion drawn from examining the various 
demodulators for PSK signals is that the processing involves the same 
types of computations and it is very possible to build one genehc digital 
demod that can be programmed off-line via microprocessor control to 
demodulate BPSK, QPSK, 8-PSK or offset-QPSK signals. Digital 
implementation of the demodulator for MSK and SMSK has not as yet been 
considered in detail; however, except for differences in the computational 
procedures their implementation can certainly be accomplished using the 
same approach already used for the methods presently solved. 
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5.0 TECHNOLOGY SURVEY 

5.1 GENERAL 

Because of the high speed requirements of the on-board processor and 
because power is at a premium onboard the satellite, the implementation 
technology used must provide high speed, low power consumption and a 
high level of integration. A survey of commercial static RAMs and 
multipliers was performed by COMSAT LAB engineers by contacting high 
speed digital device manufacturers. The results are summarized in Tables 
1 to 4. This information has been helpful in arriving at estimates of the 
power requirements for the various parts of the on-board processor and in 
carrying out trade studies between power requirements and performance. 

Of paramount importance however, is the use of radiation-hardened 
devices. Unshielded devices in space are exposed to several hundred krads 
per year (one rad corresponds to the absorption of 100 ergs per gram of 
material). Proper shielding is essential although high launch cost per 
pound discourages extensive shielding of electronic devices in satellites. 

The use of proper grounding and coupling techniques in the design of 
devices is very important to reduce effects of radiation. An example of 
this is the insertion of resistors in the feedback paths of cross-coupled 
bistable circuit elements to dissipate the energy imparted by high-energy 
particle radiation and prevent an undesired change of state. These proper 
shielding, grounding and coupling techniques go hand-in-hand with the use 
of radiation hardened devices. 

Three modes of a failure can be attributed to radiation exposure. 
Functional failures, parametric failures and single-event upsets. 

Functional failure is the failure to operate properly. Parametric failures 
occur when a device no longer meets its data-sheet specifications, 
although it may continue to function properly. A single event upset occurs 
when a high-energy particle imparts sufficient energy to a bistable circuit 
to change its state. Clearly the concern here is with the total dose of 
radiation as well as the dose rate. Both Si based and GaAs based 
technologies are promising for application requiring high speed, low power 
consumption and radiation hardened devices. 

5.2 SILICON TECHNOLOGY 

First consider Si based technologies. Several 1C manufacturers are 
involved in producing CMOS and CMOS/SOS radiation hardened devices. 
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Based on examination of the available manufacturer's information, the 
most pertinent data is presented in Table 5. This data indicates that the 
power requirements and speeds of radiation-hardened components are 
comparable to those of their nonradiation hardened counterparts. 

However, the level of integration of high speed radiation-hardened 
components is still low. High levels of integration have been achieved at 
somewhat lower speeds. One example is the 80C86RH chip from Harris 
which is a 16-bit CMOS microprocessor that provides a total dose 
hardness level as great as 1 Mrad, consumes only 0.05 w/MHz and operates 
at clock frequencies up to 5 MHz. 

5.3 VHSIC TECHNOLOGY 

For use by the military, the very high speed integrated circuit (VHSIC) 
phase I program, sponsored by the Office of the Secretary of Defense 
(OSD), addressed the objective of providing radiation hard, high speed, 
silicon 1 .25 pm technology integrated circuits for application in military 
systems. The VHSIC Phase II program extends the requirement of radiation 
hardened electronics to the 0.5 pm design rule regime. 

Under the VHSIC program, several contractors have been developing 
radiation hardened gate arrays, memories and special purpose chips 
operating at frequencies above 25 MHz with modest power consumption. 
CMOS technologies have been developed at Westinghouse, NMOS at IBM, 
CMOS/SOS at Hughes and 3D bipolar as well as CMOS at TRW. 

Some highly integrated chips that are of great interest to this study 
came out of these efforts. Multiport memories and high speed 
programmable matrix switches are among such chips. 64K SRAMs with 
access times of 35 nsec, 8 K CMOS/SOS configurable gate arrays operating 
at speeds above 25 MHz with less than 1/2 watt power dissipation are 
also among the achievements of the VHSIC program which are relevant to 
digital on-board processing. With the pipeline FFT as the workhorse of the 
demux/demod architecture special attention was paid to the recent 
developments in high speed FFTs in the VHSIC (as well as non VHSIC) 
areas. IBM and TRW are among the leaders in this area. 

IBM has produced a complex multiplier accumulator (CMAC) NMOS chip 
that operates at 25 MHz. This chip is used for the butterfly computations 
of a radix 4 FFT. However, instead of computing the individual butterflies 
as 4 point FFTs, it computes them as 4 point DFTs. This results in 1 6 
rather than 3 complex multiplications per butterfly. This amounts to more 


84 



FINAL REPORT; NAS3-24885 


than 500 percent waste in power needed for the multiplications. The 
maximum throughput rate of the IBM CMAC FFT processor is only 6.25 Mhz 
complex. Thus it takes 164 psec to compute a 1024 point transform. 
Therefore it is concluded that IBM’s highly integrated CMAC chip was 
designed as a general purpose complex multiplier accumulator and was noi: 
tailored for FFT applications. 

TRW on the other hand has produced 2 CMOS chips specifically 
designed for butterfly computations as part of the VHSIC program. The 
first called the FFT arithmetic unit (FFTAU) is about 1 x 1 inch, has 105 
pins and consumes 0.9 watts of power. The other called the FFT control 
unit (FFTCU) is also about 1 x 1 inch, has 105 pins and consumes 0.4 watts 
of power. These 2 chips operate in conjunction with 4 port RAMs in a radix 
2 decimation-in-time, in-place FFT architecture. Because pipelining is 
lacking in this architecture, the maximum clock frequency is only 16.7 
MHz. Nonetheless this is substantially higher than the 6.25 MHz of the IBM 
CMAC FFT. Also, TRW chips are more radiation hardened because of TRWs 
greater emphasis on space applications. 

A faster FFT architecture found in the technology survey was also 
from TRW but not as part of the VHSIC program. By using pipeline 
architectures like the ones outlined in the text, FFT throughput rates 
higher than 20 MHz were achieved. The power consumption for a 512 point 
20 MHz complex CMOS FFT was about 100 watts. This figure is high for 
two reasons. The first is that it uses 32 bit floating point arithmetic. 

Floating point arithmetic is more power consuming than fixed point 
arithmetic and is only needed in certain applications (our demultiplexer is 
not one of them) requiring very large dynamic ranges. The second is the 
level of integration. Before the end of the decade, much higher levels of 
integration are expected and it will be possible to put a 1024 or more 
point FFT on a single wafer resulting in a drastic decrease in weight and 
power consumption. Today's pipeline FFTs (such as TRWs and IBMs) are 
power consuming because higher levels of integration are yet to be 
achieved. 

Comsat Labs has begun implementing fixed point pipeline FFT 
processor with throughputs larger than 20 M complex samp/s and a great 
deal of experience has been accumulated in this area. This fixed point 
technology will be more power efficient than the floating point 
implementations and hence more suitable for on-board use. High level 
integration of this approach should be pursued to achieve further reduction 
in power and size. 
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5.4 GaAs TECHNOLOGY 

Consider now GaAs based technologies. On the positive side, GaAs 
digital circuits are capable of very high speed operation at low powers and 
possess a high tolerance to radiation. On the negative side cost has 
become a critical issue as a result of low yields. Also, high levels of 
integration are yet to be achieved. Provided R&D continues, it is only a 
matter of time until yields improve and integration levels increase. 

Facilities to produce LSI GaAs digital devices are being established with 
DARPA funding at Rockwell, McDonnell Douglas and Honeywell. Rockwell 
has developed 4 Kbit SRAMs with access times of 5 nsecs, and Honeywell 
is projecting 4 K, 1 ns memory with a maximum power dissipation of 1 w 
by the end of 1 987. GaAs gate arrays operating at frequencies above 1 GHz 
are also being produced with power dissipation less than 200 pw/gate. 

The application of GaAs FET (field effect transistors) technology in 
radiation environments is attractive because of the high tolerance of 
MESFET (metal semiconductor FET) devices to total ionizing dose (10® to 
10® rads). There is little information available on single event upsets in 
GaAs ICs, but the reports published so far are very promising. 

NASA has also entered the GaAs digital arena with a program for an 
adaptable, programmable processor targeted for high speed processing of 
on-board space sensor data. 

The conclusion from our technology survey is that for the near future 
high speed, low power digital signal processing will be mainly based on Si 
technologies (CMOS, CMOS/SOS) with GaAs being used mostly for high 
speed memories and at the analog to digital interface. In the farther 
future, as a result of continuing R&D in GaAs, a new generation of high 
speed, digital signal processing devices with enhanced radiation 
resistance will emerge. This can easily happen by the 1995 to 2005 time 
frame in which an operational satellite incorporating flight worth 
hardware that uses the concepts put forth in this study is likely to appear. 

In the immediate future, proof-of-concept laboratory units can be 
constructed from existing commercially available Si components and 
experimental components being developed as a result of the VHSIC 
program. 
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Table 5.1 8 x 8 Multipliers 


Technology 

Family 

Manufacturer 

Multiply 

Time 

(ns) 

Power 

(MW) 

Power For 
One Multipli- 
cation/ns 
(W) 

CMOS 

Analog Devices 

85 

75 

64 

CMOS 

TRW 

45 

31 

1.4 

GaAs 

Gigabit Logic 

10 

500 (50) 

5 (0.5) 

GaAs 

Rockwell 

5.25 

2,200 

12 

GaAs 

Toshiba 

12 

160 

1 .1 


Technology 

Family 

Table 5.2 
Manufacturer 

16x16 Multipliers 

Multiply 

Time 

(ns) 

Power 

(MW) 

MVIOS 

Bell 

20 

1,000 

CMOS 

TRW 

165 

500 

CMOS 

Analog Devices 

75 

175 

CMOS 

NEC 

45 

100 

CMOS/SOS 

Toshiba 

27 

150 

GaAs 

Fujitsu 

10.5 

950 
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Table 5.3 1 kbit of RAM 




Access 


Technology 


Time 

Power 

Family 

Manufacturer 

(ns) 

(MW) 

Ea 

Fairchild 

10 

940 

Ea 

NTT 

0.85 

950 

CMOS 

Cypress 

15 

450 

GaAs 

Fujitsu 

1.3 

300 

GaAs 

l\EC 

6 

38 

GaAs 

Gigabit Logic 

2 

1 ,500 (• 

HEMT 

Fujitsu 

3.4 

290 



Table 5.4 

4 kbits of RAM 




Access 

Power 

Technology 


Time 

(MW) 

Family 

Manufacturer 

(ns) 

IK 

ECL 

Fujitsu 

3.2 

750 

Ea 

hec 

2.3 

400 

Ea 

NTT 

1.1 

980 

Ea 

Hitachi 

2.5 

250 

mos 

Bell 

5.0 

100 

GaAs 

NTT 

2.8 

300 

GaAs 

Fujitsu 

3.0 

175 

HEMT 

Fujitsu 

4.4 

215 
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Table 5.5 Radiation Hardened CMOS and CMOS/SOS 




SRAMS 



i ii ii: 




Access 


i i i 

Size 



Time 

Power 

J -| ^ 

(kbits) 

Technology 

Manufacturer 

(ns) 

(MW) 

' f ^ 

16 

CMOS 

Honeywell 

110 

1,000 

I -I - 

16 

CMOS 

Harris 

100 

600 

i I 

64 

CMOS 

Harris 

220 


i i| i 

4 

CMOS/SOS 

CTI 

70 

125 

i i 

16 

CMOS/SOS 

CTI 

100 

400 




GATE ARRAYS 



; “ 




Time 


; 1 i 




Delay 

Power 

; 1 tr 

Number 

Technology 

Manufacturer 

(ns) 

(MW) 


3,500 

CMOS 

Honeywell 

2 

500 

' ' T 

4,000 

CMOS 

Harris 

2 


1 — ■ 

3,000 

CMOS/SOS 

CTI 

2 

480 

: :| 
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6.0 RECOMMENDATIONS 

6.1 GENERAL 

This report describes an architecture for a flexible, modular, digital 
demultiplexer/demodulator for space applications. The building blocks of 
the architecture are pipeline processors for fonvard and inverse FFTs, a 
digital adaptive interpolating filter and a generic digital demodulator 
that can be programmed via microprocessor control to demodulate 
carriers of different modulation types and bit rates. In order to make the 
transition from the concept presented in this report to a space qualified 
processor, development efforts will be needed in two main areas. 

6.2 PROOF OF CONCEPT MODEL 

The first area of development is to build a proof of concept model of a 
Flexible Demultiplexer/Demodulator for bulk demodulation of a wideband 
channel such as 40 MHz with current state of the art components. This 
will provide a valuable opportunity to work out complex structural 
details details and control of the Down Converter/Sampler Pipeline FFT, 

Carrier Channel Filter, Pipeline IFFT, Interpolator and Demodulator needed 
to bulk process multiple carriers of different bit rates. Lessons learned 
from such a model will reveal opportunities to improve the current 
architecture and significantly reduce the difficulties and uncertainties 
that can be encountered in the later evolution to a VLSI intensive 
implementation. Computer simulations to support development of such an 
exploratory hardware model are already undenway at COMSAT LABs. 

6.2.1 FLEXIBLE BULK DEMUX/DEMOD POC BREADBOARD 

The FLEXIBLE BULK DEMUX/DEMOD POC BREADBOARD would consist of 
the cardinal functional components shown in Figure 6.1 which are 
described briefly below. 

6.2.1 .1 DOWN CONVERSION AND SAMPLING 

The channel to be processed will be 40 MHz in bandwidth and centered 
at an onboard IF frequency of approximately 3 GHz. The wideband channel 
will be down converted such that its center is at zero Hz. Complex 
sampling which uses two 40 Msamp/s A/D converters operating 
synchronously but independently on each quadrature phase will be 
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incorporated. This is entirely possible in the current state of the art. To 
test the processor, an arrangement will be provided for representing 
multiple carriers ranging over carrier bit rates from 64 kbit/s to 6.144 
or 6.3 Mbit/s using QPSK modulation. Both continuous duty FDMA and 
TDMA/FDMA carriers will be represented. 


6.2.1 .2 FFT PROCESSOR 

To accommodate the lowest bit rate carrier, it is necessary that the 
spectrum be divided into frequency coefficients such that a minimum of 
16 occur per carrier. To accomplish this, an FFT capable of resolving 
16384 complex frequency coefficients over the 40 MHz wideband will be 
provided. The FFT processor will be based on a pipeline architecture 
using 25ns, 16x16 bit complex multipliers. Technology at this speed is 
currently emerging. The same FFT processor can accommodate any carrier 
bit rate up to a maximum of approximately 60 Mbit/s for QPSK 
modulation. 

6.2.1. 3 CARRIER CHANNEL FILTER 

This filter processes sets of FFT frequency domain coefficients to 
select the desired channel using a matched filter approach. It can be 
programmed from the ground via the microprocessor controller and clock 
distribution unit to accommodate any arrangement of carrier frequencies 
and bit rates in the wideband channel. Its output is a set of filtered 
frequency domain FFT coefficients representing the information content 
of individual carriers. 

6.2.1 .4 INVERSE FFT(IFFT) PROCESSOR 

The IFFT processor converts the sets of frequency domain 
coefficients for each carrier back to the time domain. Its implementation 
is such that a single pipeline FFT processor can be shared to perform the 
processing for all of the carriers. To do this, its internal operation and 
timing is properly controlled by the microprocessor controller and clock 
distribution unit according to the distribution of the carriers in the 
wideband spectrum. This can be adjusted to accommodate different 
arrangements of the carrier center frequencies and bit rates. 
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6.2.1. 5 INTERPOLATING FILTER 

The time domain samples delivered at the output of the IFFT 
processor are timed relative to the clock that controls the demultiplexer 
and this clock is established by the wideband signal sampler located at 
the input to the forward FFT. The time domain samples that are used in 
the demodulator are established by the need to sample the carrier signal 
appearing at the input to the demodulator at twice the symbol rate. 
Furthermore, the phase of the samples must be adjusted according to a 
phase control signal from the demodulator to align the samples at the 
proper positions in each symbol. To accomplish this, a sample 
interpolator will be provided between the IFFT output and the 
demodulator. The interpolation processor uses an impulse response that 
represents additional filtering of the channel and must be carefully 
chosen. 

6.2.1 .6 DEMODULATOR 

A digitally implemented demodulator architecture for extracting the 
baseband digital information from the filtered carriers will be provided. 

For the POC unit, the demodulator will be implemented for QPSK since 
this is considered to be sufficient for demonstrating the important 
principles involved in the demultiplexing/demodulation processing. This 
requires processing to recover the carrier frequency and phase, the clock 
frequency and phase and the data. The signals are presented to the 
demodulator in the form of discrete time domain samples at a rate of two 
samples on each of two quadrature channels for each symbol interval. 
These samples are processed to recover the modulated data bits. To 
accomplish this, it is necessary to acquire and maintain both symbol 
timing and carrier frequency synchronization. The demodulation 
processor is shared to demodulate all of the carriers. It must be 
controlled by the microprocessor controller and clock distribution unit to 
accommodate the arrangement of carriers and bit rates assigned in the 
wideband channel. 

6.2.1. 7 MICROPRCX^ESSOR AND CLOCK DISTRIBUTION UNIT 

Operation of the flexible demultiplexer/ demodulator POC unit must 
be tightly synchronized to provide the timing discipline needed to control 
the flow of information within and between its constituent processing 
elements. This unit provides the clocks needed to accomplish this smooth 


93 


FINAL REPORT NAS3-24885 


flow. It also provides program control of the clocks, relative timing of 
clocks and memory contents needed to adjust the system to accommodate 
different arrangements of carrier center frequencies and bit rates. 

6.2.1 .8 TEST FACILITY 

The bulk Demux/Demod POC breadboard will include a Test Facility 
that appears at both its input and output. At the input it will provide a 
means for generating an environment of multiple carriers and bit rates in 
both FDMA and TOMA formats. It will include a source for generating a 
typical bit stream at the bit rates of interest (probably using a pseudo 
random bit stream generator) and a means for measuring the BER 
encountered when the stream is processed and appears at the output of 
the bulk demux/demod. Provision will also be made to introduce carrier 
frequency uncertainty typical of the satellite uplink FDMA and 
FDMA/TDMA transmissions. It will also contain a means for injecting 
thermal noise and cochannel and adjacent channel interference to allow 
for testing under practical application scenarios. 

6.2.2 PROGRAM SCHEDULE 

A schedule for performance of the Bulk Demux/Demod POC Breadboard 
development, spanning 24 months, is shown in Figure 6.2. The work 
program is divided into four principle elements; 

1 . PROCESSOR ARCHITECTURE DEVELOPMENT 

2. FLEXIBLE BULK DEMUX/DEMOD HARDWARE DESIGN 

3. TEST BED CONSTRUCTION AND 

4. TEST AND EVALUATION 

Under the Architecture Development element, processing details will 
be examined to arrive at structures of the seven major processor 
functions, illustrated in Figure 6.1 which when integrated will meet the 
needs for bulk demultiplexing/demodulation of multiple carriers with 
multiple bit rates in a 40 MHz wideband channel. Careful attention must 
be paid to minimizing the total computation load by using efficient 
procedures and algorithms and distributing the load across multiple 
calculating elements so as to arrive at a practical implementation. 

Computer modeling of key processor functions will be used to study 
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implementations and to assure the proper working and interworking of the 
components. The tasks under this element are divided into five groupings 
comprising combinations of the processors illustrated in Figure 6.1 that 
are logically associated. The work is time phased to allow for 
distribution of the talent of the experts needed to do the job over all of 
the processing components. The architecture development is fully 
completed by the 10th month. 

Under the Hardware Design element, the resulting architectures are to 
be committed to a hardware design. Because of the high speed of the 
processor components, care must be exerted to maintain a chip layout 
that minimizes transport delays and fully considers race conditions. It is 
expected that compact multilayer printed circuits designed to minimize 
reflections will be extensively incorporated. Both VHSIC and other 
commercially available VLSI components will be used extensively to 
achieve hardware that exhibits a practical balance between speed, 
compactness and power. Use of high speed gate array chips will be 
considered where costs permit. The effort on the processors will be 
grouped and time staggered in the same manner as done for the 
architectual development to distribute the work load. Hardware design 
begins in the 7th month and is completed by the 17th month. 

Flexible Bulk Demux/Demod POC Test Bed construction begins in the 
13th month with release of the Down Converter and Sampler design. 
Construction will continue on all processing components as design 
releases occur and be completed by the 20th month. The Test Facility 
design and construction will be initiated in the 16th month and its 
completion will coincide with the completion of the Flexible Bulk 
Demux/Demod. The test bed will be of a quality of construction suitable 
for use as a laboratory test and evaluation tool. 

Following completion of construction, a rigorous Test and Evaluation 
Program will be performed from the 20th to the 24th month. Tests will be 
designed to evaluate the performance of the Flexible Bulk Demux/Demod 
under the frequency uncertainty, interference, noise and signal fade 
environment characteristic of satellite onboard signal reception expected 
at Ka band. 

A final report will be prepared and delivered 2 months following 
completion of the work. It will contain full documentation of the 
architecture, design and construction and the results of the test and 
evaluation. It will provide sufficient information to create a design plan 
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for a space flight model. It will also contain a technology update of the 
components becoming available that may promise further improvements 
in the design and its radiation robustness of the processor. 

The POC model is intended to provide an opportunity to develop the 
details of the flexible demultiplexer/demodulator processor architecture 
using available components and to provide a vehicle to experience its 
operating principles. Hence the accent should be on precise inspection of 
the fine details of the processor and its control. Power and weight are 
also important considerations but in this effort they should be 
subservient to the need to define the most efficient processing system 
that can later be implemented with advanced VLSI components to 
minimize power and weight. 

6.3 SEMICONDUCTOR TECHNOLOGY 

The second area where NASA should direct its R&D resources is in 
advancing the state of the art of semiconductor technology as it relates 
to the on-board processor. Needless to say that advances in technology 
stimulated by such a program will have spillovers in both civilian and 
military areas as well, as has happened many times in the past in NASA 
sponsored programs. Specific recommendations or areas of technology 
where efforts need to be directed are given below. 

6.3.1 HIGHLY INTEGRATED CHIPS 

Development of chips with high levels of integration that are 
designed to perform specific tasks is essential. The butterfly operation 
of the FFT is a good example. Rather than using individual multiplier and 
adder chips combined with memory and control chips as the building 
blocks to construct a butterfly element, large savings in power, weight 
and size can be realized by a single butterfly chip that embodies all of 
these functions in a single processor. This level of integration is 
certainly within the realm of today's technology, an example being IBM's 
complex multiply accumulator chip CMAC. What is needed is a proven 
formula for the implementation such as that that can be realized by 
pursuing the directions outlined in this report to the next phase, namely 
the construction of a discrete component proof of concept model. 

6.3.2 FFT ELEMENTS 

The FFT plays such a fundamental role in digital signal processing 
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that the design and fabrication of a special purpose chip to perform its 
fundamental operation, i.e. the butterfly, is a very sensible objective. For 
high speed real-time applications, a pipeline FFT similar to the one 
discussed in this report is often needed. Such a pipeline processor 
requires in addition to the butterfly elements, a commutator element 
involving delays of various sizes as well as switches to control it. Again, 
a special purpose chip (as opposed to combining several chips with low 
levels of integration) to perform the commutator action would 
significantly reduce the power and weight of the processor. Indeed, 
provided that adequate funding is available, it should be possible to build 
an entire pipeline processor on a single wafer before the end of this 
decade. 

6.3.3 MICROPROCESSORS 

Another area where technological advances are needed is radiation 
hardened microprocessors operating at high speeds with low power 
consumption. Harris has produced a 32 bit radiation hardened 
microprocessor operating on a 5 Mhz clock with as little as 0.05 w/Mhz. 

A great variety of timing and control functions will need to be performed 
at great speeds on-board the satellite and microprocessors working on 
faster clocks will be needed. 

6.3.4 GaAs TECHNOLOGY 

An area that is showing great promise is GaAs technology with its 
high speed, low power and high radiation resistance. Efforts to increase 
the packing density of GaAs chips are needed before the benefits of this 
technology can be fully reaped. In the area of memory storage 
(particularly PROMs) the high immunity of GaAs to single event upsets 
(that could reverse a stored 1 into a 0 or vice versa) makes it particularly 
attractive. Work is needed to produce large GaAs memories on smaller 
chips. 

6.3.5 A/D CONVERTERS 

At the analog to digital interface, today's A/D converters operating 
at speeds above 50 Mhz can provide 8 bits of quantization with good 
linearity. In order to process wideband transponders of 80 Mhz or more 
and resolve them into hundreds of narrowband carriers, a large dynamic 
range calling for A/Ds of 1 0-1 2 bits will be needed. This is particularly 
true when operating at band where severe fades occur. Work has been 
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going on in this area for many years using bipolar technology and more 
recently CMOS and GaAs and should continue. 

6.4 SUMMARY 

Some areas of development have been outlined above that are 
important to achieving technological advances to establish a position to 
build a space-qualified advanced digital processor. Clearly progress in 
these technological areas will have benefits reaching far beyond any one 
particular program. More project oriented efforts should focus on 
building an exploratory model of the digital processor as a stepping stone 
before embarking on a more elaborate and costly VLSI implementation. 
Such an effort should go hand in hand with efforts on the technology 
development side so that in a few years both areas will have matured 
enough to realize a very sophisticated on-board processor. Parallel 
development efforts aimed at improving the implementation algorithms 
at the same time as the technology needed to realize the implementation 
is advanced is the real secret to successful realization of advanced 
onboard processing machines of the future. It is important that these 
developments be pursued vigorously with the goal of a practical 
implementation by 1995 if the satellite communications industry is to 
make use of the technology in the next generation of commercial 
satellites. 
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